damccorm opened a new issue, #21432:
URL: https://github.com/apache/beam/issues/21432
The following beam pipeline works correctly using `DirectRunner` but fails
with a very vague error when using `DataflowRunner`.
```
(
pipeline
| beam.io.ReadFromPubSub(input_topic, with_attributes=True)
| beam.Map(pubsub_message_to_row)
| beam.WindowInto(beam.transforms.window.FixedWindows(5))
| beam.GroupBy(<beam.Row col name>)
| beam.CombineValues(<instance of beam.CombineFn subclass>)
| beam.Values()
| beam.io.gcp.bigquery.WriteToBigQuery(
. . . )
)
```
Stacktrace:
```
Traceback (most recent call last):
File "src/read_quality_pipeline/__init__.py", line 128, in <module>
(
File
"/home/pkg_dev/.cache/pypoetry/virtualenvs/apache-beam-poc-5nxBvN9R-py3.8/lib/python3.8/site-packages/apache_beam/pipeline.py",
line 597, in __exit__
self.result.wait_until_finish()
File
"/home/pkg_dev/.cache/pypoetry/virtualenvs/apache-beam-poc-5nxBvN9R-py3.8/lib/python3.8/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
line 1633, in wait_until_finish
raise DataflowRuntimeException(
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
Dataflow pipeline failed. State: FAILED, Error:
Error processing pipeline.
```
Log output:
```
2022-02-01T16:54:43.645Z: JOB_MESSAGE_WARNING: Autoscaling is enabled for
Dataflow Streaming Engine.
Workers will scale between 1 and 100 unless maxNumWorkers is specified.
2022-02-01T16:54:43.736Z: JOB_MESSAGE_DETAILED:
Autoscaling is enabled for job 2022-02-01_08_54_40-8791019287477103665. The
number of workers will be
between 1 and 100.
2022-02-01T16:54:43.757Z: JOB_MESSAGE_DETAILED: Autoscaling was
automatically enabled
for job 2022-02-01_08_54_40-8791019287477103665.
2022-02-01T16:54:44.624Z: JOB_MESSAGE_ERROR: Error
processing pipeline.
```
With the `CombineValues` step removed this pipeline successfully starts in
dataflow.
I thought this was an issue with Dataflow on the server side since the
Dataflow API (v1b3.projects.locations.jobs.messages) is just returning the
textPayload: "Error processing pipeline". But then I found the issue BEAM-12636
where a go SDK user has the same error message but seemingly as a result of
bugs in the go SDK?
Imported from Jira
[BEAM-13795](https://issues.apache.org/jira/browse/BEAM-13795). Original Jira
may contain additional context.
Reported by: Jake_Zuliani.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]