[ 
https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638935#comment-14638935
 ] 

Aljoscha Krettek commented on FLINK-2398:
-----------------------------------------

Yes, I think this is a prerequisite for several things we're working on.

I've started fiddling around with this and I have it working except for 
split/select and iterations. I think the first step is to remove the dependency 
on StreamGraph from DataStream and the other API classes. Once this is done we 
can change the representation to make it work for both stream and batch.

In the code I have right now, the transform() method of DataStream simply 
creates a OneInputTransformation, transform() on ConnectedDataStream creates a 
TwoInputTransformation, union creates a UnionTransformation, and so on...

> Decouple StreamGraph Building from the API
> ------------------------------------------
>
>                 Key: FLINK-2398
>                 URL: https://issues.apache.org/jira/browse/FLINK-2398
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>
> Currently, the building of the StreamGraph is very intertwined with the API 
> methods. DataStream knows about the StreamGraph and keeps track of splitting, 
> selected names, unions and so on. This leads to the problem that is is very 
> hard to understand how the StreamGraph is built because the code that does it 
> is all over the place. This also makes it hard to extend/change parts of the 
> Streaming system.
> I propose to introduce "Transformations". A transformation hold information 
> about one operation: The input streams, types, names, operator and so on. An 
> API method creates a transformation instead of fiddling with the StreamGraph 
> directly. A new component, the StreamGraphGenerator creates a StreamGraph 
> from the tree of transformations that result from program specification using 
> the API methods. This would relieve DataStream from knowing about the 
> StreamGraph and makes unions, splitting, selection visible transformations 
> instead of being scattered across the different API classes as fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to