Minbo Bae created BEAM-13709:
--------------------------------
Summary: PipelineOptions() and from_dictionary parsing
use_public_ips and no_use_public_ips differently
Key: BEAM-13709
URL: https://issues.apache.org/jira/browse/BEAM-13709
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Reporter: Minbo Bae
{{PipelineOptions}} in Python has two methods to pass a param dict: using in
constructor {{PipelineOptions(**params)}} or
{{{}PipelineOptions.from_dictionary(params){}}}.
But, they work slightly differently:
*
[PipelineOptions(**params)|https://github.com/apache/beam/blob/v2.35.0/sdks/python/apache_beam/options/pipeline_options.py#L313-L324]
discards an option if it is not defined as a dest of {{argparse}} in an Option
class. For example, {{no_use_public_ips=True}} is ignored and the Dataflow job
will run with public IPs. To disable public IPs, the option dictionary must use
{{{}use_public_ips{}}}.
*
[PipelineOptions.from_dictionary()|https://github.com/apache/beam/blob/v2.35.0/sdks/python/apache_beam/options/pipeline_options.py#L229]
skips an option if the option value is {{{}False{}}}. For example,
{{use_public_ips=False}} is ignored and the Dataflow job will run with public
IPs. To disable public IPs, the option dictionary must use
{{no_use_public_ips.}}
This makes the user very confused, and sometimes the pipeline works in an
unexpected way.
We must have the consistent behavior between the two methods, or at least a
warning about invalid ignored options.
BEAM-9093 dealt with a similar issue for {{PipelineOptions()}}. Like the issue,
I guess adding a warning in `PipelineOptions.from_dictionary()` for ignored
options can help reducing the confusion, if we cannot have two methods have
exactly the same behavior.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)