[
https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=225827&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225827
]
ASF GitHub Bot logged work on BEAM-6942:
----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Apr/19 20:22
Start Date: 10/Apr/19 20:22
Worklog Time Spent: 10m
Work Description: tvalentyn commented on pull request #8225: [BEAM-6942]
Make modifications to pipeline options to be visible to all views.
URL: https://github.com/apache/beam/pull/8225#discussion_r274140134
##########
File path: sdks/python/apache_beam/options/pipeline_options.py
##########
@@ -796,13 +796,13 @@ def _add_argparse_args(cls, parser):
'"<ENV_VAL>"} }. All fields in the json are optional except '
'command.'))
parser.add_argument(
- '--sdk-worker-parallelism', default=None,
+ '--sdk_worker_parallelism', default=None,
help=('Sets the number of sdk worker processes that will run on each '
'worker node. Default is 1. If 0, it will be automatically set '
'by the runner by looking at different parameters (e.g. number '
'of CPU cores on the worker machine).'))
parser.add_argument(
- '--environment-cache-millis', default=0,
+ '--environment_cache_millis', default=0,
Review comment:
@udim pointed me to the codepath that allows adding options not defined in
any of PipelineOptions subclasses:
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L227
After looking further, I think I understand what happens:
1) PortableRunner sends a request to get pipeline supported by any given
Runner implementing FnAPI:
https://github.com/apache/beam/blob/684f8130284a7c7979773300d04e5473ca0ac8f3/sdks/python/apache_beam/runners/portability/portable_runner.py#L251
2) Flink runner replies that `environment_cache_millis` is a supported
option.
3) We dynamically try to make `environment_cache_millis` as an option
recognizable by arguments the parser in
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/runners/portability/portable_runner.py#L281,
by specifying add_extra_args_fn.
4) PortableOptions declares `environment-cache-millis` as an option. Even
though the name contains dashes, `argparse` will store the value for this
option in a dictionary with a key where dashes are normalized to underscores:
https://docs.python.org/dev/library/argparse.html#dest.
5) With this PR, since the value for `environment-cache-millis` is NOT
specified on the command line (but a value with underscores is set to 10000),
`self._all_options['environment_cache_millis']` is initialized to 0, the
current default value. This value takes precedence in `get_all_options()`:
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L237,
and causes the test to fail.
6) Without this PR, the value for `environment_cache_millis` in
`self._all_options` will be initialized to 0 at the first access
https://github.com/apache/beam/blob/120394be6ca979892bce98bdea6ca01439b22476/sdks/python/apache_beam/options/pipeline_options.py#L267,
but this never happens because `environment_cache_millis` is not used anywhere
in Python SDK, so the test passes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 225827)
Time Spent: 5h 10m (was: 5h)
> Pipeline options to experiment propagation is not working in Dataflow runner.
> -----------------------------------------------------------------------------
>
> Key: BEAM-6942
> URL: https://issues.apache.org/jira/browse/BEAM-6942
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Valentyn Tymofieiev
> Priority: Major
> Time Spent: 5h 10m
> Remaining Estimate: 0h
>
> Relevant code:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388]
> 3 experiments/options are affected. We need to fix it in 2.12.0
> cc: [~altay] [~apilloud]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)