[ https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699841#comment-14699841 ]
ASF GitHub Bot commented on FLINK-2398: --------------------------------------- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/988#issuecomment-131892889 I changed it to execute dangling operators now. There is, however, a strange "feature". This code works on master: https://gist.github.com/aljoscha/bbe74309a31a16ca8413. It catches away the exception that results from not being able to determine the output type of the generic map. Then, when execute is called it executes just fine up until (and including) the generic map as can be seen from the `println` output. With this PR this won't work anymore. The upon `execute` the the StreamGraphBuilder tries to build the StreamGraph from the graph of Transformations. It encounters the dangling map for which the output type cannot be determined and then it fails. This behavior is problematic since the TestStreamEnvironment is reused for several streaming tests. Tests fail in seemingly unconnected parts of the code because dangling operators without type information still linger in the execution environment. I mentioned this here: https://issues.apache.org/jira/browse/FLINK-2508 I have a quick fix for this, for now. I think, however, that the streaming tests need to be consolidated and the streaming environments also need to be refactored a bit. (In addition to the batch exec envs, because they should probably be reused in large parts for streaming.) > Decouple StreamGraph Building from the API > ------------------------------------------ > > Key: FLINK-2398 > URL: https://issues.apache.org/jira/browse/FLINK-2398 > Project: Flink > Issue Type: Improvement > Components: Streaming > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > > Currently, the building of the StreamGraph is very intertwined with the API > methods. DataStream knows about the StreamGraph and keeps track of splitting, > selected names, unions and so on. This leads to the problem that is is very > hard to understand how the StreamGraph is built because the code that does it > is all over the place. This also makes it hard to extend/change parts of the > Streaming system. > I propose to introduce "Transformations". A transformation hold information > about one operation: The input streams, types, names, operator and so on. An > API method creates a transformation instead of fiddling with the StreamGraph > directly. A new component, the StreamGraphGenerator creates a StreamGraph > from the tree of transformations that result from program specification using > the API methods. This would relieve DataStream from knowing about the > StreamGraph and makes unions, splitting, selection visible transformations > instead of being scattered across the different API classes as fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)