damccorm opened a new issue, #20889: URL: https://github.com/apache/beam/issues/20889
Current java direct runner doesn't fuse transforms into steps. Instead, it almost executes each transform one by one. It results in memory pressure when any transform is high-fanout. We already have a simple fusion logic in Java SDK(https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPipelineFuser.java). Work remaining here might be: * Apply such fusion into DirectRunner * Change the DirectRunner to be able run the fused steps. I understand that DirectRunner doesn't expect processing large volume data and changing DirectRunner execution might be a fair amount of work. Imported from Jira [BEAM-12335](https://issues.apache.org/jira/browse/BEAM-12335). Original Jira may contain additional context. Reported by: boyuanz. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
