Brian Hulette created BEAM-10274:
------------------------------------
Summary: Python SDK can't parse type=json.loads pipeline options
at execution time
Key: BEAM-10274
URL: https://issues.apache.org/jira/browse/BEAM-10274
Project: Beam
Issue Type: Bug
Components: sdk-py-core, sdk-py-harness
Reporter: Brian Hulette
It's pretty common to use `type=json.loads` in argparse to create JSON
formatted options, in fact we have a couple in Beam:
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L431-L443
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L577-L586
Attempting to access these options at pipeline execution time yields an error
(note the single quotes):
{code}
argparse.ArgumentError: argument --beam_services: invalid loads value: "{'foo':
'bar'}"
{code}
Why does this happen?
- sdk_worker_main.py received these values from the PIPELINE_OPTIONS env var
which represents them as proper JSON:
{code}
..., "some_option": "some_value", "json_option": {"foo": "bar"}}, ...
{code}
- The json is loaded and parsed with PipelineOptions.from_dictionary:
https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L168-L181
- from_dictionary just [writes out the value with
str(v)|https://github.com/apache/beam/blob/61b665640d6c0f91751bba59782c0ac6aceacba6/sdks/python/apache_beam/options/pipeline_options.py#L241-L249].
When the option is accessed we attempt re-parse it, and it's no longer valid
JSON, so json.loads fails.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)