[
https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658810#comment-14658810
]
ASF GitHub Bot commented on FLINK-2398:
---------------------------------------
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/988#issuecomment-128133833
About rebalance()/forward(). Yes, when the parallelism differs it throws an
exception now. Previously, when a user did not specify a partition strategy,
forward was assumed. This was valid for a change of parallelism, which led to
either the degenerative case of only one downstream instance receiving elements
(1 to n parallelism) or one or several downstream instances receiving skewed
numbers of instances (m to n, where m > n).
I think we can document forward as the default for n -> n parallelism and
rebalance as default for n -> m parallelism and change the behavior.
About the dangling operators, also true. I think before it was more an
implementation artifact because the stream graph was basically being built form
the sources. Now it is built from the sinks. I see that this can be good
behavior and I can adapt the current code if we agree on this.
> Decouple StreamGraph Building from the API
> ------------------------------------------
>
> Key: FLINK-2398
> URL: https://issues.apache.org/jira/browse/FLINK-2398
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Reporter: Aljoscha Krettek
> Assignee: Aljoscha Krettek
>
> Currently, the building of the StreamGraph is very intertwined with the API
> methods. DataStream knows about the StreamGraph and keeps track of splitting,
> selected names, unions and so on. This leads to the problem that is is very
> hard to understand how the StreamGraph is built because the code that does it
> is all over the place. This also makes it hard to extend/change parts of the
> Streaming system.
> I propose to introduce "Transformations". A transformation hold information
> about one operation: The input streams, types, names, operator and so on. An
> API method creates a transformation instead of fiddling with the StreamGraph
> directly. A new component, the StreamGraphGenerator creates a StreamGraph
> from the tree of transformations that result from program specification using
> the API methods. This would relieve DataStream from knowing about the
> StreamGraph and makes unions, splitting, selection visible transformations
> instead of being scattered across the different API classes as fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)