Kyle Weaver created BEAM-7657:
---------------------------------

             Summary: sdk worker parallelism comments are misleading
                 Key: BEAM-7657
                 URL: https://issues.apache.org/jira/browse/BEAM-7657
             Project: Beam
          Issue Type: Improvement
          Components: runner-flink
            Reporter: Kyle Weaver
            Assignee: Kyle Weaver


The SDK worker parallelism arg is set two places, in pipeline options [1] [2] 
and the job server driver [3].

 
{noformat}
if pipeline.sdk_worker_parallelism > 0:
    pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
    if jobServerDriver.sdkWorkerParallelism > 0:
        jobServerDriver.sdkWorkerParallelism is used.
    else:
        the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java 
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so 
the comment "If 0, it will be automatically set by looking at different 
parameters.." is misleading, and actually only true if 
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.

[1] 
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]

[2] 
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]

[3] 
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to