[
https://issues.apache.org/jira/browse/STORM-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522455#comment-15522455
]
Roshan Naik commented on STORM-1961:
------------------------------------
*Concepts and definitions*
Adding them to README is definitely important. But it is a bit difficult to
review unless these concepts are carefully defined in the design document also.
Will help reviewers understand the general direction, benefits and constraints
of the design.
*StreamBuilder*
Ok, it sounds like StreamBuilder is not the right name for it then. Topology or
TopologyBuilder may be a better name. Seems similar to TridentTopology class.
The build() call can still be moved inside the submt() method ?
*The T in Stream<T>*
Do you mean to say the type of values passing through the stream does not
change ? I see from your examples of mapToPair etc, that it is possible for
types to change as you run through the operators. I would think the input and
output types would also change in case of aggregations, windowing, etc.
*flatMapArray*
Supporting arrays natively help perf due to reduced allocation and GC overhead.
So it may superceede those concerns.
*Branching*
The branching example in your doc differs from the example in your last comment
here. Which one is correct ?
*Parallelism Hints* : Can you show examples of how parallism hints at
spout/bolt level will then be controlled by the user ?
*Windowing & Batching*
Windows are logical groupings from the point of view of the needs of the
business/user. Batching has to do with efficiency or providing guarantees. I
can think of 3 kinds of batching for Streaming:
# The units of IO performed by a spout(kafka). Basically buffered reads. And
units of IO performed by a terminal bolt(Hbase/Hdfs). Basically flushing of
writes.
# Units in which things are moved around in the internal queues.
# Unit of processing/delivery (set of tuples) from the processing guarantee
standpoint. Basically Trident micro-batches.
None of these 3 types should get implicitly coupled with Windowing boundaries,
or with each other.
AFAICT what you refer to as batching (aggreation/join boundaries), is windowing
itself. IMO those two terms should not be conflated.
You can have 1hr or 1 day or even longer windows. Any type of batching at those
intervals wont make sense. Then you can have overlapping windows & sliding
windows. Batching cannot be overlapping or sliding. A tuple can be part of
multiple windows but part of only one batch.
*Microbatching support*
This statement in the design document :
{quote}
The idea is to provide a common typed api to express streaming transformations
easily and to address both the stormcore and trident use cases
{quote}
suggests you are trying to support micro-batching also. But there is no
information on how it will be supported. Supporting microbatching, AFAICT, is a
complex problem unto itself. Which also makes me wonder about the rationale
for the rejected alternative. I am thinking, you will face the similar issues
in this API as well.
*Custom Operators*
IMO, how users can do this easily extend the APIs is something that may need
upfront thinking as it will impact the design and interfaces. Otherwise might
end up with something that is hard for end users.
> Come up with streams api for storm core use cases
> -------------------------------------------------
>
> Key: STORM-1961
> URL: https://issues.apache.org/jira/browse/STORM-1961
> Project: Apache Storm
> Issue Type: Sub-task
> Reporter: Arun Mahadevan
> Assignee: Arun Mahadevan
> Attachments: UnifiedStreamapiforStorm.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)