[GitHub] [beam] damccorm opened a new issue, #21432: `beam.CombineValues` on DataFlow runner causes ambiguous failure with python SDK

GitBox Sat, 04 Jun 2022 16:31:22 -0700


damccorm opened a new issue, #21432:
URL: https://github.com/apache/beam/issues/21432


    
   
   The following beam pipeline works correctly using `DirectRunner` but fails 
with a very vague error when using `DataflowRunner`.
   ```
   
   (    
   pipeline    
   | beam.io.ReadFromPubSub(input_topic, with_attributes=True)    
   | beam.Map(pubsub_message_to_row)
      
   | beam.WindowInto(beam.transforms.window.FixedWindows(5))    
   | beam.GroupBy(<beam.Row col name>)
      
   | beam.CombineValues(<instance of beam.CombineFn subclass>)    
   | beam.Values()  
   | beam.io.gcp.bigquery.WriteToBigQuery(
   . . . )
   )
   ```
   
   Stacktrace:
   ```
   
   Traceback (most recent call last):
     File "src/read_quality_pipeline/__init__.py", line 128, in <module>
   
      (
     File 
"/home/pkg_dev/.cache/pypoetry/virtualenvs/apache-beam-poc-5nxBvN9R-py3.8/lib/python3.8/site-packages/apache_beam/pipeline.py",
   line 597, in __exit__
       self.result.wait_until_finish()
     File 
"/home/pkg_dev/.cache/pypoetry/virtualenvs/apache-beam-poc-5nxBvN9R-py3.8/lib/python3.8/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
   line 1633, in wait_until_finish
       raise DataflowRuntimeException(
   apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
   Dataflow pipeline failed. State: FAILED, Error:
   Error processing pipeline. 
   ```
   
   Log output:
   ```
   
   2022-02-01T16:54:43.645Z: JOB_MESSAGE_WARNING: Autoscaling is enabled for 
Dataflow Streaming Engine.
   Workers will scale between 1 and 100 unless maxNumWorkers is specified.
   2022-02-01T16:54:43.736Z: JOB_MESSAGE_DETAILED:
   Autoscaling is enabled for job 2022-02-01_08_54_40-8791019287477103665. The 
number of workers will be
   between 1 and 100.
   2022-02-01T16:54:43.757Z: JOB_MESSAGE_DETAILED: Autoscaling was 
automatically enabled
   for job 2022-02-01_08_54_40-8791019287477103665.
   2022-02-01T16:54:44.624Z: JOB_MESSAGE_ERROR: Error
   processing pipeline. 
   ```
   
   With the `CombineValues` step removed this pipeline successfully starts in 
dataflow.
   
    
   
   I thought this was an issue with Dataflow on the server side since the 
Dataflow API (v1b3.projects.locations.jobs.messages) is just returning the 
textPayload: "Error processing pipeline". But then I found the issue BEAM-12636 
where a go SDK user has the same error message but seemingly as a result of 
bugs in the go SDK?
   
   Imported from Jira 
[BEAM-13795](https://issues.apache.org/jira/browse/BEAM-13795). Original Jira 
may contain additional context.
   Reported by: Jake_Zuliani.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #21432: `beam.CombineValues` on DataFlow runner causes ambiguous failure with python SDK

Reply via email to