Ruoyun Huang created BEAM-6068:
----------------------------------

             Summary: Wordcount example fails to read from gcs shakespare text 
file
                 Key: BEAM-6068
                 URL: https://issues.apache.org/jira/browse/BEAM-6068
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
            Reporter: Ruoyun Huang
            Assignee: Ahmet Altay


Symptom: 

In a synced-to-head repo, following command fails:

python -m apache_beam.examples.wordcount   --input 
gs://dataflow-samples/shakespeare/kinglear.txt   --output gs://$USER-test/tmp   
--runner DataflowRunner   --project google.com:clouddfe   --temp_location 
gs://$USER-test/temp-it   --experiment beam_fn_api   --sdk_location 
dist/apache-beam-2.9.0.dev0.tar.gz

 

error message being: 

File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
 "__main__", fname, loader, pkg_name)
 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
 exec code in run_globals
 File 
"/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
 line 136, in <module>
 run()
 File 
"/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py",
 line 90, in run
 lines = p | 'read' >> ReadFromText(known_args.input)
 File "apache_beam/io/textio.py", line 524, in __init__
 skip_header_lines=skip_header_lines)
 File "apache_beam/io/textio.py", line 119, in __init__
 validate=validate)
 File "apache_beam/io/filebasedsource.py", line 121, in __init__
 self._validate()
 File "apache_beam/options/value_provider.py", line 137, in _f
 return fnc(self, *args, **kwargs)
 File "apache_beam/io/filebasedsource.py", line 178, in _validate
 match_result = FileSystems.match([pattern], limits=[1])[0]
 File "apache_beam/io/filesystems.py", line 187, in match
 return filesystem.match(patterns, limits)
 File "apache_beam/io/filesystem.py", line 705, in match
 raise BeamIOError("Match operation failed", exceptions)
apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions 
\{'gs://dataflow-samples/shakespeare/kinglear.txt': TypeError("__init__() got 
an unexpected keyword argument 'response_encoding'",)}

 

 

However, I can run the similar command by reverting to 2.8 release and rebuild 
everything. This command succeeds: 

python -m apache_beam.examples.wordcount   
--input=gs://dataflow-samples/shakespeare/kinglear.txt  
--output=gs://test-$USER/portable/   --runner DataflowRunner --project 
$GCP_PROJECT  --staging_location gs://test-$USER/staging_wc --temp_location 
gs://test-$USER/tmp \ --sdk_location=./dist/apache-beam-2.8.0.dev0.tar.gz

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to