[ 
https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=225827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225827
 ]

ASF GitHub Bot logged work on BEAM-6942:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Apr/19 20:22
            Start Date: 10/Apr/19 20:22
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on pull request #8225: [BEAM-6942]  
Make modifications to pipeline options to be visible to all views.
URL: https://github.com/apache/beam/pull/8225#discussion_r274140134
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -796,13 +796,13 @@ def _add_argparse_args(cls, parser):
               '"<ENV_VAL>"} }. All fields in the json are optional except '
               'command.'))
     parser.add_argument(
-        '--sdk-worker-parallelism', default=None,
+        '--sdk_worker_parallelism', default=None,
         help=('Sets the number of sdk worker processes that will run on each '
               'worker node. Default is 1. If 0, it will be automatically set '
               'by the runner by looking at different parameters (e.g. number '
               'of CPU cores on the worker machine).'))
     parser.add_argument(
-        '--environment-cache-millis', default=0,
+        '--environment_cache_millis', default=0,
 
 Review comment:
   @udim pointed me to  the codepath that allows adding options not defined in 
any of PipelineOptions subclasses: 
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L227
 After looking further, I think I understand what happens:
   1) PortableRunner sends a request to get pipeline supported by any given 
Runner implementing FnAPI: 
https://github.com/apache/beam/blob/684f8130284a7c7979773300d04e5473ca0ac8f3/sdks/python/apache_beam/runners/portability/portable_runner.py#L251
   2) Flink runner replies that `environment_cache_millis` is a supported 
option.
   3) We dynamically try to make `environment_cache_millis` as an option 
recognizable by arguments the parser in 
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/runners/portability/portable_runner.py#L281,
 by specifying add_extra_args_fn.
   4)  PortableOptions declares `environment-cache-millis` as an option. Even 
though the name contains dashes, `argparse` will store the value for this 
option in a dictionary with a key where dashes are normalized to underscores: 
https://docs.python.org/dev/library/argparse.html#dest.
   5) With this PR, since the value for `environment-cache-millis` is NOT 
specified on the command line (but a value with underscores is set to 10000), 
`self._all_options['environment_cache_millis']` is initialized to 0, the 
current default value. This value takes precedence in `get_all_options()`: 
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L237,
 and causes the test to fail.
   6) Without this PR, the value for `environment_cache_millis` in 
`self._all_options` will be initialized to 0 at the first access 
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L267,
 but this never happens because `environment_cache_millis` is not used anywhere 
in Python SDK, so the test passes.
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 225827)
    Time Spent: 5h 10m  (was: 5h)

> Pipeline options to experiment propagation is not working in Dataflow runner.
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-6942
>                 URL: https://issues.apache.org/jira/browse/BEAM-6942
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Relevant code: 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388]
> 3 experiments/options are affected. We need to fix it in 2.12.0
> cc: [~altay] [~apilloud]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to