[
https://issues.apache.org/jira/browse/BEAM-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruoyun Huang reassigned BEAM-6068:
----------------------------------
Assignee: Mark Liu (was: Ahmet Altay)
> Wordcount example fails to read from gcs shakespare text file
> -------------------------------------------------------------
>
> Key: BEAM-6068
> URL: https://issues.apache.org/jira/browse/BEAM-6068
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Ruoyun Huang
> Assignee: Mark Liu
> Priority: Major
>
> Symptom:
> In a synced-to-head repo, following command fails:
> python -m apache_beam.examples.wordcount --input
> gs://dataflow-samples/shakespeare/kinglear.txt --output gs://$USER-test/tmp
> --runner DataflowRunner --project google.com:clouddfe --temp_location
> gs://$USER-test/temp-it --experiment beam_fn_api --sdk_location
> dist/apache-beam-2.9.0.dev0.tar.gz
>
> error message being:
> File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
> File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
> File
> "/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
> line 136, in <module>
> run()
> File
> "/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
> line 90, in run
> lines = p | 'read' >> ReadFromText(known_args.input)
> File "apache_beam/io/textio.py", line 524, in __init__
> skip_header_lines=skip_header_lines)
> File "apache_beam/io/textio.py", line 119, in __init__
> validate=validate)
> File "apache_beam/io/filebasedsource.py", line 121, in __init__
> self._validate()
> File "apache_beam/options/value_provider.py", line 137, in _f
> return fnc(self, *args, **kwargs)
> File "apache_beam/io/filebasedsource.py", line 178, in _validate
> match_result = FileSystems.match([pattern], limits=[1])[0]
> File "apache_beam/io/filesystems.py", line 187, in match
> return filesystem.match(patterns, limits)
> File "apache_beam/io/filesystem.py", line 705, in match
> raise BeamIOError("Match operation failed", exceptions)
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions
> \{'gs://dataflow-samples/shakespeare/kinglear.txt': TypeError("__init__() got
> an unexpected keyword argument 'response_encoding'",)}
>
>
> However, I can run the similar command by reverting to 2.8 release and
> rebuild everything. This command succeeds:
> python -m apache_beam.examples.wordcount
> --input=gs://dataflow-samples/shakespeare/kinglear.txt
> --output=gs://test-$USER/portable/ --runner DataflowRunner --project
> $GCP_PROJECT --staging_location gs://test-$USER/staging_wc --temp_location
> gs://test-$USER/tmp \ --sdk_location=./dist/apache-beam-2.8.0.dev0.tar.gz
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)