[jira] [Commented] (APEXMALHAR-2099) Identify overlap between Beam API and existing Apex Stream API

Siyuan Hua (JIRA) Wed, 01 Jun 2016 18:05:18 -0700

    [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311525#comment-15311525
 ]


Siyuan Hua commented on APEXMALHAR-2099:
----------------------------------------

[~ilganeli] Also we do have plan for GroupBy, It's just not there yet :) We 
need to build the operator fist. 
You can follow this ticket 
https://issues.apache.org/jira/browse/APEXMALHAR-2085 for details. Actually it 
would address more problems.
In terms of high-level API, we categorize all transformations into 2 different 
type windowed operations/non-windowed operations. As you might see there is a 
WindowedStream interface  as well(empty for now but methods will be added 
later). For operations like map, filter, window is meaningless, but for some 
operations like group, join etc window is a must have(by default it might be 
one globalwindow, but still there is a window for that). Further more, we would 
divide windowed operations into 3 different types: group, combine, join. Group 
needs to retain the tuples in the window(sorting for example), Combine needs to 
combine the tuples in the window into one/multiple tuples(max, sum, reduce, 
topN etc) and Join needs to merge multiple upstream inputs into one. All in 
all, we do have detail plan for that. 
Now let's look at the Beam API again, personally I think there are a lot of 
mess there, I also looked at all PTransform there(The categorization I have 
above actually comes from the analysis of all PTransforms  :) ). They don't 
have clear hierarchy in their PTransform family. For example window is one of 
transformations, that is really against my interpretation of the word 
"Transformation". Window alone is meaningless, what we actually want to do is 
Sum, Max, Average in a specific Window. Like you said, it's just a strategy in 
our next cut of Stream API.
And then, here you will have abstraction in different layers eventually,  A 
simple PTransform base class DOES NOT work.  The reason we categorize 
operations into windowed and non-windowed and further break them into 
Group/Combine/Join is because operations in same category usually share similar 
internal mechanism like state checkpointing, trigger behaviour so on so forth. 
You don't want to do repeat work for those. And Beam actually do the same 
thing. For example, if you want to do Sum, there is no such thing called 
SumTransform, instead they have a number of Sum**Fn classes which implements 
CombineFn class which is finally used by some PTransform.   We will have 
similar method in WindowedStream interface which takes a CombineFn instead of 
Sum, Reduce (or maybe have them all) etc.

Thanks for your opinion, let's together make our API rock! 

> Identify overlap between Beam API and existing Apex Stream API
> --------------------------------------------------------------
>
>                 Key: APEXMALHAR-2099
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2099
>             Project: Apache Apex Malhar
>          Issue Type: Sub-task
>            Reporter: Ilya Ganelin
>
> There should be some overlap between the Beam API and the recently released 
> Apex Stream API. This task captures the need to understand and document this 
> overlap.
> AC:
> * A document or JIRA comment identifying which components of the Beam API are 
> implement, similar, or absent within the Apex Stream API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (APEXMALHAR-2099) Identify overlap between Beam API and existing Apex Stream API

Reply via email to