[
https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226661
]
ASF GitHub Bot logged work on BEAM-6942:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Apr/19 13:36
Start Date: 12/Apr/19 13:36
Worklog Time Spent: 10m
Work Description: mxm commented on pull request #8225: [BEAM-6942] Make
modifications to pipeline options to be visible to all views.
URL: https://github.com/apache/beam/pull/8225#discussion_r274897911
##########
File path: sdks/python/apache_beam/options/pipeline_options.py
##########
@@ -796,13 +796,13 @@ def _add_argparse_args(cls, parser):
'"<ENV_VAL>"} }. All fields in the json are optional except '
'command.'))
parser.add_argument(
- '--sdk-worker-parallelism', default=None,
+ '--sdk_worker_parallelism', default=None,
help=('Sets the number of sdk worker processes that will run on each '
'worker node. Default is 1. If 0, it will be automatically set '
'by the runner by looking at different parameters (e.g. number '
'of CPU cores on the worker machine).'))
parser.add_argument(
- '--environment-cache-millis', default=0,
+ '--environment_cache_millis', default=0,
Review comment:
@tvalentyn Thanks for taking the time to look at this. I agree that the
pipeline options retrieval is not obvious and can cause some confusion.
However, before we had to replicate options in all SDKs which proofed to be
error-prone.
> Why is some subset of Flink runner options defined in PortableOptions? It
seems that we should remove them, since they are not used by PortableRunner?
`EnvironmentCacheMillis` is only used by the Flink Runner but it was assumed
to be applicable to all portable Runners. I can't speak for Dataflow, but I
know that it will be useful for Spark as well. It's used for batch workloads
where subsequent operators want to reuse the same environment, e.g. read a
local file produced by the preceding operator. I think we could find a better
solution for this in the future.
The only other option I see is `sdkWorkerParallelism`. We have experimented
with this for the Flink Runner with different work loads and I'm sure other
Runners may want to have control over the Harness parallelism.
> If we need to keep environment_cache_millis in PortableOptions, is the
default 0 reasonable?
`0` is the default in the Java SDK:
https://github.com/apache/beam/commit/63ddad144c1ca7066673b0756c25962f24c899ee#diff-a42abc13cb0431ddc407d59b6023c873R92
>Looks like current RunnerOptions class is not used, instead we try to get
runner-specific options via JobApi
`RunnerOptions` is merely a place holder. The `DescribePipelineOptions`
request returns all available options from the Java SDK, including their
default values.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 226661)
Time Spent: 9.5h (was: 9h 20m)
> Pipeline options to experiment propagation is not working in Dataflow runner.
> -----------------------------------------------------------------------------
>
> Key: BEAM-6942
> URL: https://issues.apache.org/jira/browse/BEAM-6942
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Valentyn Tymofieiev
> Priority: Major
> Time Spent: 9.5h
> Remaining Estimate: 0h
>
> Relevant code:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388]
> 3 experiments/options are affected. We need to fix it in 2.12.0
> cc: [~altay] [~apilloud]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)