cozos opened a new issue, #24422: URL: https://github.com/apache/beam/issues/24422
### What happened? The python `iobase.Read` transform is a splittable dofn. Since SparkRunner does not support splittable dofns, all Read operations end up on one Spark task/partition. This does not scale on any moderate + sized workload. To fix this, I had to set the option `--experiments=pre_optimize=all`, which expands the SDF into a pair + split + read. But this option is hidden/undocumented/magic. I think it would be better if `translations.expand_sdf` was enabled on all runners that don't support SDFs. ### Issue Priority Priority: 2 ### Issue Component Component: runner-spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
