Kyle Weaver created BEAM-7657:
---------------------------------
Summary: sdk worker parallelism comments are misleading
Key: BEAM-7657
URL: https://issues.apache.org/jira/browse/BEAM-7657
Project: Beam
Issue Type: Improvement
Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver
The SDK worker parallelism arg is set two places, in pipeline options [1] [2]
and the job server driver [3].
{noformat}
if pipeline.sdk_worker_parallelism > 0:
pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
if jobServerDriver.sdkWorkerParallelism > 0:
jobServerDriver.sdkWorkerParallelism is used.
else:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so
the comment "If 0, it will be automatically set by looking at different
parameters.." is misleading, and actually only true if
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
[1]
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
[2]
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
[3]
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)