[ 
https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226661
 ]

ASF GitHub Bot logged work on BEAM-6942:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Apr/19 13:36
            Start Date: 12/Apr/19 13:36
    Worklog Time Spent: 10m 
      Work Description: mxm commented on pull request #8225: [BEAM-6942]  Make 
modifications to pipeline options to be visible to all views.
URL: https://github.com/apache/beam/pull/8225#discussion_r274897911
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -796,13 +796,13 @@ def _add_argparse_args(cls, parser):
               '"<ENV_VAL>"} }. All fields in the json are optional except '
               'command.'))
     parser.add_argument(
-        '--sdk-worker-parallelism', default=None,
+        '--sdk_worker_parallelism', default=None,
         help=('Sets the number of sdk worker processes that will run on each '
               'worker node. Default is 1. If 0, it will be automatically set '
               'by the runner by looking at different parameters (e.g. number '
               'of CPU cores on the worker machine).'))
     parser.add_argument(
-        '--environment-cache-millis', default=0,
+        '--environment_cache_millis', default=0,
 
 Review comment:
   @tvalentyn Thanks for taking the time to look at this. I agree that the 
pipeline options retrieval is not obvious and can cause some confusion. 
However, before we had to replicate options in all SDKs which proofed to be 
error-prone.
   
   >  Why is some subset of Flink runner options defined in PortableOptions? It 
seems that we should remove them, since they are not used by PortableRunner? 
   
   `EnvironmentCacheMillis` is only used by the Flink Runner but it was assumed 
to be applicable to all portable Runners. I can't speak for Dataflow, but I 
know that it will be useful for Spark as well. It's used for batch workloads 
where subsequent operators want to reuse the same environment, e.g. read a 
local file produced by the preceding operator. I think we could find a better 
solution for this in the future.
   
   The only other option I see is `sdkWorkerParallelism`. We have experimented 
with this for the Flink Runner with different work loads and I'm sure other 
Runners may want to have control over the Harness parallelism.
   
   > If we need to keep environment_cache_millis in PortableOptions, is the 
default 0 reasonable?
   
   `0` is the default in the Java SDK: 
https://github.com/apache/beam/commit/63ddad144c1ca7066673b0756c25962f24c899ee#diff-a42abc13cb0431ddc407d59b6023c873R92
   
   >Looks like current RunnerOptions class is not used, instead we try to get 
runner-specific options via JobApi
   
   `RunnerOptions` is merely a place holder. The `DescribePipelineOptions` 
request returns all available options from the Java SDK, including their 
default values.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 226661)
    Time Spent: 9.5h  (was: 9h 20m)

> Pipeline options to experiment propagation is not working in Dataflow runner.
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-6942
>                 URL: https://issues.apache.org/jira/browse/BEAM-6942
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Relevant code: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388]
> 3 experiments/options are affected. We need to fix it in 2.12.0
> cc: [~altay] [~apilloud]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to