Ruoyun Huang created BEAM-6068:
----------------------------------
Summary: Wordcount example fails to read from gcs shakespare text
file
Key: BEAM-6068
URL: https://issues.apache.org/jira/browse/BEAM-6068
Project: Beam
Issue Type: Improvement
Components: sdk-py-core
Reporter: Ruoyun Huang
Assignee: Ahmet Altay
Symptom:
In a synced-to-head repo, following command fails:
python -m apache_beam.examples.wordcount --input
gs://dataflow-samples/shakespeare/kinglear.txt --output gs://$USER-test/tmp
--runner DataflowRunner --project google.com:clouddfe --temp_location
gs://$USER-test/temp-it --experiment beam_fn_api --sdk_location
dist/apache-beam-2.9.0.dev0.tar.gz
error message being:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File
"/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
line 136, in <module>
run()
File
"/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
line 90, in run
lines = p | 'read' >> ReadFromText(known_args.input)
File "apache_beam/io/textio.py", line 524, in __init__
skip_header_lines=skip_header_lines)
File "apache_beam/io/textio.py", line 119, in __init__
validate=validate)
File "apache_beam/io/filebasedsource.py", line 121, in __init__
self._validate()
File "apache_beam/options/value_provider.py", line 137, in _f
return fnc(self, *args, **kwargs)
File "apache_beam/io/filebasedsource.py", line 178, in _validate
match_result = FileSystems.match([pattern], limits=[1])[0]
File "apache_beam/io/filesystems.py", line 187, in match
return filesystem.match(patterns, limits)
File "apache_beam/io/filesystem.py", line 705, in match
raise BeamIOError("Match operation failed", exceptions)
apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions
\{'gs://dataflow-samples/shakespeare/kinglear.txt': TypeError("__init__() got
an unexpected keyword argument 'response_encoding'",)}
However, I can run the similar command by reverting to 2.8 release and rebuild
everything. This command succeeds:
python -m apache_beam.examples.wordcount
--input=gs://dataflow-samples/shakespeare/kinglear.txt
--output=gs://test-$USER/portable/ --runner DataflowRunner --project
$GCP_PROJECT --staging_location gs://test-$USER/staging_wc --temp_location
gs://test-$USER/tmp \ --sdk_location=./dist/apache-beam-2.8.0.dev0.tar.gz
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)