[ 
https://issues.apache.org/jira/browse/FLINK-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573434#comment-16573434
 ] 

ASF GitHub Bot commented on FLINK-9289:
---------------------------------------

fhueske commented on issue #6003: [FLINK-9289][Dataset] Parallelism of 
generated operators should have max parallelism of input
URL: https://github.com/apache/flink/pull/6003#issuecomment-411464535
 
 
   Hi @xccui, in principle uniform handling of operators is a very good idea!
   
   However, union is a special operator, because there is no Union runtime 
operator. An operator that consumes from Union will simply multiplex all 
unioned inputs. Hence, the parallelism of the (non-existing) union is the 
parallelism of its successor. Since there is already quite a bit of custom code 
for union operators, I'm a bit afraid that your fix might cause problems in 
some corner cases.
   
   Another benefit of my approach is that the key extractors can be chained to 
the previous operators. This is not the case if the extractor is applied on the 
unioned result. 
   
   Best, Fabian

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parallelism of generated operators should have max parallism of input
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9289
>                 URL: https://issues.apache.org/jira/browse/FLINK-9289
>             Project: Flink
>          Issue Type: Bug
>          Components: DataSet API
>    Affects Versions: 1.5.0, 1.4.2, 1.6.0
>            Reporter: Fabian Hueske
>            Assignee: Xingcan Cui
>            Priority: Major
>              Labels: pull-request-available
>
> The DataSet API aims to chain generated operators such as key extraction 
> mappers to their predecessor. This is done by assigning the same parallelism 
> as the input operator.
> If a generated operator has more than two inputs, the operator cannot be 
> chained anymore and the operator is generated with default parallelism. This 
> can lead to a {code}NoResourceAvailableException: Not enough free slots 
> available to run the job.{code} as reported by a user on the mailing list: 
> https://lists.apache.org/thread.html/60a8bffcce54717b6273bf3de0f43f1940fbb711590f4b90cd666c9a@%3Cuser.flink.apache.org%3E
> I suggest to set the parallelism of a generated operator to the max 
> parallelism of all of its inputs to fix this problem.
> Until the problem is fixed, a workaround is to set the default parallelism at 
> the {{ExecutionEnvironment}}:
> {code}
> ExecutionEnvironment env = ...
> env.setParallelism(2);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to