Minbo Bae created BEAM-13709:
--------------------------------

             Summary: PipelineOptions() and from_dictionary parsing 
use_public_ips and no_use_public_ips differently
                 Key: BEAM-13709
                 URL: https://issues.apache.org/jira/browse/BEAM-13709
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Minbo Bae


{{PipelineOptions}} in Python has two methods to pass a param dict: using in 
constructor {{PipelineOptions(**params)}} or 
{{{}PipelineOptions.from_dictionary(params){}}}.

But, they work slightly differently:
 * 
[PipelineOptions(**params)|https://github.com/apache/beam/blob/v2.35.0/sdks/python/apache_beam/options/pipeline_options.py#L313-L324]
 discards an option if it is not defined as a dest of {{argparse}} in an Option 
class. For example, {{no_use_public_ips=True}} is ignored and the Dataflow job 
will run with public IPs. To disable public IPs, the option dictionary must use 
{{{}use_public_ips{}}}.
 * 
[PipelineOptions.from_dictionary()|https://github.com/apache/beam/blob/v2.35.0/sdks/python/apache_beam/options/pipeline_options.py#L229]
 skips an option if the option value is {{{}False{}}}. For example, 
{{use_public_ips=False}} is ignored and the Dataflow job will run with public 
IPs. To disable public IPs, the option dictionary must use 
{{no_use_public_ips.}}

This makes the user very confused, and sometimes the pipeline works in an 
unexpected way. 

We must have the consistent behavior between the two methods, or at least a 
warning about invalid ignored options.

BEAM-9093 dealt with a similar issue for {{PipelineOptions()}}. Like the issue, 
I guess adding a warning in `PipelineOptions.from_dictionary()` for ignored 
options can help reducing the confusion, if we cannot have two methods have 
exactly the same behavior.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to