[ https://issues.apache.org/jira/browse/STORM-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522455#comment-15522455 ]
Roshan Naik commented on STORM-1961: ------------------------------------ *Concepts and definitions* Adding them to README is definitely important. But it is a bit difficult to review unless these concepts are carefully defined in the design document also. Will help reviewers understand the general direction, benefits and constraints of the design. *StreamBuilder* Ok, it sounds like StreamBuilder is not the right name for it then. Topology or TopologyBuilder may be a better name. Seems similar to TridentTopology class. The build() call can still be moved inside the submt() method ? *The T in Stream<T>* Do you mean to say the type of values passing through the stream does not change ? I see from your examples of mapToPair etc, that it is possible for types to change as you run through the operators. I would think the input and output types would also change in case of aggregations, windowing, etc. *flatMapArray* Supporting arrays natively help perf due to reduced allocation and GC overhead. So it may superceede those concerns. *Branching* The branching example in your doc differs from the example in your last comment here. Which one is correct ? *Parallelism Hints* : Can you show examples of how parallism hints at spout/bolt level will then be controlled by the user ? *Windowing & Batching* Windows are logical groupings from the point of view of the needs of the business/user. Batching has to do with efficiency or providing guarantees. I can think of 3 kinds of batching for Streaming: # The units of IO performed by a spout(kafka). Basically buffered reads. And units of IO performed by a terminal bolt(Hbase/Hdfs). Basically flushing of writes. # Units in which things are moved around in the internal queues. # Unit of processing/delivery (set of tuples) from the processing guarantee standpoint. Basically Trident micro-batches. None of these 3 types should get implicitly coupled with Windowing boundaries, or with each other. AFAICT what you refer to as batching (aggreation/join boundaries), is windowing itself. IMO those two terms should not be conflated. You can have 1hr or 1 day or even longer windows. Any type of batching at those intervals wont make sense. Then you can have overlapping windows & sliding windows. Batching cannot be overlapping or sliding. A tuple can be part of multiple windows but part of only one batch. *Microbatching support* This statement in the design document : {quote} The idea is to provide a common typed api to express streaming transformations easily and to address both the stormcore and trident use cases {quote} suggests you are trying to support micro-batching also. But there is no information on how it will be supported. Supporting microbatching, AFAICT, is a complex problem unto itself. Which also makes me wonder about the rationale for the rejected alternative. I am thinking, you will face the similar issues in this API as well. *Custom Operators* IMO, how users can do this easily extend the APIs is something that may need upfront thinking as it will impact the design and interfaces. Otherwise might end up with something that is hard for end users. > Come up with streams api for storm core use cases > ------------------------------------------------- > > Key: STORM-1961 > URL: https://issues.apache.org/jira/browse/STORM-1961 > Project: Apache Storm > Issue Type: Sub-task > Reporter: Arun Mahadevan > Assignee: Arun Mahadevan > Attachments: UnifiedStreamapiforStorm.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)