tvalentyn opened a new issue, #31607:
URL: https://github.com/apache/beam/issues/31607
### What happened?
The following error might occur in some pipelines, possibly
non-deterministically:
```
Exception serializing message!
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/grpc/_common.py", line 89,
in _transform
return transformer(message)
ValueError: Message org.apache.beam.model.fn_execution.v1.Elements exceeds
maximum protobuf size of 2GB: 2887086320
Traceback (most recent call last):
File
\"/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/data_plane.py\",
line 700, in _read_inputs
for elements in elements_iterator:
File \"/usr/local/lib/python3.10/site-packages/grpc/_channel.py\", line
542, in __next__
return self._next()
File \"/usr/local/lib/python3.10/site-packages/grpc/_channel.py\", line
968, in _next
raise self
```
This issue is caused by large elements in Beam pipeline. If you see this
error, upgrade to Apache Beam 2.57.0. Apache Beam 2.57.0 improves a codepath
that could suboptimally combine multiple large elements together. It also
adds better logging when large elements are detected. If you run the pipeline
on 2.57.0 and above, and failures persist, look for warnings like:
```
Data output stream buffer size ... exceeds 536870912 bytes. This is likely
due to a large element in a PCollection.
```
or errors like:
```
Buffer size ... exceeds GRPC limit 2147483548. This is likely due to a
single element that is too large.
```
If you see these warnings, inspect the logs to see which pipeline step emits
these messages, and consider reducing the size of the individual elements in
pcollections in your pipeline in those steps.
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]