[ 
https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699841#comment-14699841
 ] 

ASF GitHub Bot commented on FLINK-2398:
---------------------------------------

Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/988#issuecomment-131892889
  
    I changed it to execute dangling operators now. There is, however, a 
strange "feature". This code works on master: 
https://gist.github.com/aljoscha/bbe74309a31a16ca8413. It catches away the 
exception that results from not being able to determine the output type of the 
generic map. Then, when execute is called it executes just fine up until (and 
including) the generic map as can be seen from the `println` output.
    
    With this PR this won't work anymore. The upon `execute` the the 
StreamGraphBuilder tries to build the StreamGraph from the graph of 
Transformations. It encounters the dangling map for which the output type 
cannot be determined and then it fails.
    
    This behavior is problematic since the TestStreamEnvironment is reused for 
several streaming tests. Tests fail in seemingly unconnected parts of the code 
because dangling operators without type information still linger in the 
execution environment. I mentioned this here: 
https://issues.apache.org/jira/browse/FLINK-2508
    
    I have a quick fix for this, for now. I think, however, that the streaming 
tests need to be consolidated and the streaming environments also need to be 
refactored a bit. (In addition to the batch exec envs, because they should 
probably be reused in large parts for streaming.)  



> Decouple StreamGraph Building from the API
> ------------------------------------------
>
>                 Key: FLINK-2398
>                 URL: https://issues.apache.org/jira/browse/FLINK-2398
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>
> Currently, the building of the StreamGraph is very intertwined with the API 
> methods. DataStream knows about the StreamGraph and keeps track of splitting, 
> selected names, unions and so on. This leads to the problem that is is very 
> hard to understand how the StreamGraph is built because the code that does it 
> is all over the place. This also makes it hard to extend/change parts of the 
> Streaming system.
> I propose to introduce "Transformations". A transformation hold information 
> about one operation: The input streams, types, names, operator and so on. An 
> API method creates a transformation instead of fiddling with the StreamGraph 
> directly. A new component, the StreamGraphGenerator creates a StreamGraph 
> from the tree of transformations that result from program specification using 
> the API methods. This would relieve DataStream from knowing about the 
> StreamGraph and makes unions, splitting, selection visible transformations 
> instead of being scattered across the different API classes as fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to