[jira] [Commented] (STORM-1961) Come up with streams api for storm core use cases

Roshan Naik (JIRA) Mon, 26 Sep 2016 01:32:05 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522455#comment-15522455
 ]


Roshan Naik commented on STORM-1961:
------------------------------------

*Concepts and definitions*
Adding them to README is definitely important. But it is a bit difficult to 
review unless these concepts are carefully defined in the design document also. 
Will help reviewers understand the general direction, benefits and constraints 
of the design.

*StreamBuilder*
Ok, it sounds like StreamBuilder is not the right name for it then. Topology or 
TopologyBuilder may be a better name. Seems similar to TridentTopology class.  
The build() call can still be moved inside the submt() method ?



*The T in Stream<T>*
Do you mean to say the type of values passing through the stream does not 
change ? I see from your examples of mapToPair etc, that it is possible for 
types to change as you run through the operators. I would think the input and 
output types would also change in case of aggregations, windowing, etc. 

*flatMapArray*
Supporting arrays natively help perf due to reduced allocation and GC overhead. 
So it may superceede those concerns.


*Branching*
The branching example in your doc differs from the example in your last comment 
here. Which one is correct ?

*Parallelism Hints* : Can you show examples of how parallism hints at 
spout/bolt level will then be controlled by the user ?


*Windowing & Batching*

Windows are logical groupings from the point of view of the needs of the 
business/user. Batching has to do with efficiency or providing guarantees. I 
can think of 3 kinds of batching for Streaming:

# The units of IO performed by a spout(kafka). Basically buffered reads. And 
units of IO performed by a terminal bolt(Hbase/Hdfs). Basically flushing of 
writes.
# Units in which things are moved around in the internal queues. 
# Unit of processing/delivery (set of tuples) from the processing guarantee 
standpoint. Basically Trident micro-batches.

None of these 3 types should get implicitly coupled with Windowing boundaries, 
or with each other. 

AFAICT what you refer to as batching (aggreation/join boundaries), is windowing 
itself. IMO those two terms should not be conflated. 

You can have 1hr or 1 day or even longer windows. Any type of batching at those 
intervals wont make sense. Then you can have overlapping windows & sliding 
windows. Batching cannot be overlapping or sliding. A tuple can be part of 
multiple windows but part of only one batch.


*Microbatching support*

This statement in the design document :
{quote}
The idea is to provide a common typed api to express streaming transformations 
easily and to address both the stormcore and trident use cases
{quote}

suggests you are trying to support micro-batching also. But there is no 
information on how it will be supported. Supporting microbatching, AFAICT, is a 
complex problem unto itself.  Which also makes me wonder about the rationale 
for the rejected alternative. I am thinking, you will face the similar issues 
in this API as well. 


*Custom Operators*
IMO, how users can do this easily extend the APIs is something that may need 
upfront thinking as it will impact the  design and interfaces. Otherwise might 
end up with something that is hard for end users. 

> Come up with streams api for storm core use cases
> -------------------------------------------------
>
>                 Key: STORM-1961
>                 URL: https://issues.apache.org/jira/browse/STORM-1961
>             Project: Apache Storm
>          Issue Type: Sub-task
>            Reporter: Arun Mahadevan
>            Assignee: Arun Mahadevan
>         Attachments: UnifiedStreamapiforStorm.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1961) Come up with streams api for storm core use cases

Reply via email to