cozos commented on issue #24365: URL: https://github.com/apache/beam/issues/24365#issuecomment-1335895475
Upon thinking about this further I think the bottleneck came from the reader problem I had in here https://github.com/apache/beam/issues/24422 Basically all Reads only happen on 1 partition on runners that don't support SDF. But this issue was being obscured by Spark UI stage showing the job stuck at the shuffle boundary which came from WriteToParquet (when in reality the bottleneck was at the Read). We can close this issue but I don't know if the `GroupByKey` on `None` is still a problem we want to track (as it could also cause a bottleneck). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
