[
https://issues.apache.org/jira/browse/BEAM-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-7657:
-------------------------------
Status: Open (was: Triage Needed)
> sdk worker parallelism comments are misleading
> ----------------------------------------------
>
> Key: BEAM-7657
> URL: https://issues.apache.org/jira/browse/BEAM-7657
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Minor
>
> The SDK worker parallelism arg is set two places, in pipeline options [1] [2]
> and the job server driver [3].
>
> {noformat}
> if pipeline.sdk_worker_parallelism > 0:
> pipeline.sdk_worker_parallelism is used.
> elif pipeline.sdk_worker_parallelism == 0:
> if jobServerDriver.sdkWorkerParallelism > 0:
> jobServerDriver.sdkWorkerParallelism is used.
> elif jobServerDriver.sdkWorkerParallelism == 0:
> the runner chooses parallelism based on cores available.
> {noformat}
> Somewhat confusingly, the default is 0 for python pipelines, but 1 for java
> pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so
> the comment "If 0, it will be automatically set by looking at different
> parameters.." is misleading, and actually only true if
> jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
> [1]
> [https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
> [2]
> [https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
> [3]
> [https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)