[
https://issues.apache.org/jira/browse/BEAM-5724?focusedWorklogId=159074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-159074
]
ASF GitHub Bot logged work on BEAM-5724:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Oct/18 09:53
Start Date: 26/Oct/18 09:53
Worklog Time Spent: 10m
Work Description: mxm commented on a change in pull request #6835:
[BEAM-5724] Generalize flink executable context to allow more than 1 worker
process per task manager
URL: https://github.com/apache/beam/pull/6835#discussion_r228463738
##########
File path:
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java
##########
@@ -73,20 +73,13 @@
void setDefaultEnvironmentConfig(@Nullable String config);
- String SDK_WORKER_PARALLELISM_PIPELINE = "pipeline";
- String SDK_WORKER_PARALLELISM_STAGE = "stage";
-
@Description(
- "SDK worker/harness process parallelism. Currently supported options are
"
- + "<null> (let the runner decide) or '"
- + SDK_WORKER_PARALLELISM_PIPELINE
- + "' (single SDK harness process per pipeline and runner process) or
'"
- + SDK_WORKER_PARALLELISM_STAGE
- + "' (separate SDK harness for every executable stage).")
+ "Sets the number of sdk worker processes that will run on each worker
node. Default is 1. If"
+ + " 0, it will be automatically set according to the number of CPU
cores on the worker.")
@Nullable
- String getSdkWorkerParallelism();
+ Long getSdkWorkerParallelism();
- void setSdkWorkerParallelism(@Nullable String parallelism);
+ void setSdkWorkerParallelism(@Nullable Long parallelism);
Review comment:
Same here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 159074)
Time Spent: 50m (was: 40m)
> Beam creates too many sdk_worker processes with --sdk-worker-parallelism=stage
> ------------------------------------------------------------------------------
>
> Key: BEAM-5724
> URL: https://issues.apache.org/jira/browse/BEAM-5724
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Micah Wylde
> Assignee: Micah Wylde
> Priority: Major
> Labels: portability-flink
> Time Spent: 50m
> Remaining Estimate: 0h
>
> In the flink portable runner, we currently support two options for sdk worker
> parallelism (how many python worker processes we run). The default is one per
> taskmanager, and with --sdk-worker-parallelism=stage you get one per stage.
> However, for complex pipelines with many beam operators that get fused into a
> single flink task this can produce hundreds of worker processes per TM.
> Flink uses the notion of task slots to limit resource utilization on a box; I
> think that beam should try to respect those limits as well. I think ideally
> we'd produce a single python worker per task slot/flink operator chain.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)