[
https://issues.apache.org/jira/browse/BEAM-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kyle Weaver updated BEAM-7657:
------------------------------
Description:
The SDK worker parallelism arg is set two places, in pipeline options [1] [2]
and the job server driver [3].
{noformat}
if pipeline.sdk_worker_parallelism > 0:
pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
if jobServerDriver.sdkWorkerParallelism > 0:
jobServerDriver.sdkWorkerParallelism is used.
elif jobServerDriver.sdkWorkerParallelism == 0:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so
the comment "If 0, it will be automatically set by looking at different
parameters.." is misleading, and actually only true if
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
[1]
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
[2]
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
[3]
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]
was:
The SDK worker parallelism arg is set two places, in pipeline options [1] [2]
and the job server driver [3].
{noformat}
if pipeline.sdk_worker_parallelism > 0:
pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
if jobServerDriver.sdkWorkerParallelism > 0:
jobServerDriver.sdkWorkerParallelism is used.
else:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so
the comment "If 0, it will be automatically set by looking at different
parameters.." is misleading, and actually only true if
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
[1]
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
[2]
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
[3]
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]
> sdk worker parallelism comments are misleading
> ----------------------------------------------
>
> Key: BEAM-7657
> URL: https://issues.apache.org/jira/browse/BEAM-7657
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Minor
>
> The SDK worker parallelism arg is set two places, in pipeline options [1] [2]
> and the job server driver [3].
>
> {noformat}
> if pipeline.sdk_worker_parallelism > 0:
> pipeline.sdk_worker_parallelism is used.
> elif pipeline.sdk_worker_parallelism == 0:
> if jobServerDriver.sdkWorkerParallelism > 0:
> jobServerDriver.sdkWorkerParallelism is used.
> elif jobServerDriver.sdkWorkerParallelism == 0:
> the runner chooses parallelism based on cores available.
> {noformat}
> Somewhat confusingly, the default is 0 for python pipelines, but 1 for java
> pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so
> the comment "If 0, it will be automatically set by looking at different
> parameters.." is misleading, and actually only true if
> jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
> [1]
> [https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
> [2]
> [https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
> [3]
> [https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)