[ 
https://issues.apache.org/jira/browse/FLINK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579647#comment-16579647
 ] 

ASF GitHub Bot commented on FLINK-9289:
---------------------------------------

fhueske commented on a change in pull request #6003: [FLINK-9289][Dataset] 
Parallelism of generated operators should have max parallelism of input
URL: https://github.com/apache/flink/pull/6003#discussion_r209909097
 
 

 ##########
 File path: 
flink-java/src/main/java/org/apache/flink/api/java/operators/UnionOperator.java
 ##########
 @@ -62,4 +62,10 @@ public UnionOperator(DataSet<T> input1, DataSet<T> input2, 
String unionLocationN
        protected Union<T> translateToDataFlow(Operator<T> input1, Operator<T> 
input2) {
                return new Union<T>(input1, input2, unionLocationName);
        }
+
+       @Override
+       public UnionOperator<T> setParallelism(int parallelism) {
+               // The parallelism of an UnionOperator should not be set.
 
 Review comment:
   Change the comment to: "Union is not translated to an independent operator 
but executed by multiplexing its input on the following operator. Hence, the 
parallelism of a Union cannot be set."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parallelism of generated operators should have max parallism of input
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9289
>                 URL: https://issues.apache.org/jira/browse/FLINK-9289
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API
>    Affects Versions: 1.5.0, 1.4.2, 1.6.0
>            Reporter: Fabian Hueske
>            Assignee: Xingcan Cui
>            Priority: Major
>              Labels: pull-request-available
>
> The DataSet API aims to chain generated operators such as key extraction 
> mappers to their predecessor. This is done by assigning the same parallelism 
> as the input operator.
> If a generated operator has more than two inputs, the operator cannot be 
> chained anymore and the operator is generated with default parallelism. This 
> can lead to a {code}NoResourceAvailableException: Not enough free slots 
> available to run the job.{code} as reported by a user on the mailing list: 
> https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E
> I suggest to set the parallelism of a generated operator to the max 
> parallelism of all of its inputs to fix this problem.
> Until the problem is fixed, a workaround is to set the default parallelism at 
> the {{ExecutionEnvironment}}:
> {code}
> ExecutionEnvironment env = ...
> env.setParallelism(2);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to