Mark Grey created BEAM-14083:
--------------------------------

             Summary: ReadFromBigquery examples throw pickling exception when 
using InteractiveRunner
                 Key: BEAM-14083
                 URL: https://issues.apache.org/jira/browse/BEAM-14083
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp, runner-py-interactive
    Affects Versions: 2.37.0, 2.36.0, 2.35.0
         Environment: Cloud Dataflow Workbench Notebook on GCP
Apache Beam 2.37.0 Kernel for Python 3
            Reporter: Mark Grey


When using a combination of the python InteractiveRunner and 
beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials 
for BigQuery trigger and exception that appears to result from failing to 
serialize generators:

{code:java|title=notebook.py|borderStyle=solid}
pipeline = beam.Pipeline(InteractiveRunner(), options=options)
max_temperatures = (
    pipeline
    | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery(
        query='SELECT max_temperature FROM '\
              '`clouddataflow-readonly.samples.weather_stations`',
        use_standard_sql=True, gcs_location=gcs_location)
    # Each row is a dictionary where the keys are the BigQuery columns
    | beam.Map(lambda elem: elem['max_temperature']))
pipeline.run()
{code}

{noformat}
~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py 
in <lambda>(x)
    800     protocol = pickle.HIGHEST_PROTOCOL
    801     return coder_impl.CallbackCoderImpl(
--> 802         lambda x: dumps(x, protocol), pickle.loads)
    803 
    804   def as_deterministic_coder(self, step_label, error_message=None):

TypeError: can't pickle generator objects [while running '[6]: 
QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
{noformat}

The interactive pipeline works as expected in version 2.34



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to