[
https://issues.apache.org/jira/browse/BEAM-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brian Hulette updated BEAM-10274:
---------------------------------
Status: Open (was: Triage Needed)
> Python SDK can't parse type=json.loads pipeline options at execution time
> -------------------------------------------------------------------------
>
> Key: BEAM-10274
> URL: https://issues.apache.org/jira/browse/BEAM-10274
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core, sdk-py-harness
> Reporter: Brian Hulette
> Priority: P2
>
> It's pretty common to use `type=json.loads` in argparse to create JSON
> formatted options, in fact we have a couple in Beam:
> https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L431-L443
> https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L577-L586
> Attempting to access these options at pipeline execution time yields an error
> (note the single quotes):
> {code}
> argparse.ArgumentError: argument --beam_services: invalid loads value:
> "{'foo': 'bar'}"
> {code}
> Why does this happen?
> - sdk_worker_main.py received these values from the PIPELINE_OPTIONS env var
> which represents them as proper JSON:
> {code}
> ..., "some_option": "some_value", "json_option": {"foo": "bar"}}, ...
> {code}
> - The json is loaded and parsed with PipelineOptions.from_dictionary:
> https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L168-L181
> - from_dictionary just [writes out the value with
> str(v)|https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L241-L249].
> When the option is accessed we attempt re-parse it, and it's no longer valid
> JSON, so json.loads fails.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)