[ https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
uncleGen updated SPARK-3376: ---------------------------- Description: I think a memory-based shuffle can reduce some overhead of disk I/O. I just want to know is there any plan to do something about it. Or any suggestion about it. Base on the work (SPARK-2044), it is feasible to have several implementations of shuffle. was: I think a memory-based shuffle can reduce some overhead of disk I/O. I just want to know is there any plan to do something about it. Or any suggestion about it. Base on the work (SPARK-2044), it is feasible to have several implementations of shuffle. Following is my testing on "InMemory Shuffle" | data size | partitions | resources | | 5131859218 | 2000 | 50 executors/ 4 cores/ 4GB | | settings | operation1 | operation2 | | shuffle spill & lz4 | repartition+flatMap+groupByKey | repartition + groupByKey | |memory | 38s | 16s | |sort | 45s | 28s | |hash | 46s | 28s | |no shuffle spill & lz4 | | | | memory | 16s | 16s | | | | | |shuffle spill & lzf | | | |memory| 28s | 27s | |sort | 29s | 29s | |hash | 41s | 30s | |no shuffle spill & lzf | | | | memory | 15s | 16s | > Memory-based shuffle strategy to reduce overhead of disk I/O > ------------------------------------------------------------ > > Key: SPARK-3376 > URL: https://issues.apache.org/jira/browse/SPARK-3376 > Project: Spark > Issue Type: Planned Work > Reporter: uncleGen > Priority: Trivial > > I think a memory-based shuffle can reduce some overhead of disk I/O. I just > want to know is there any plan to do something about it. Or any suggestion > about it. Base on the work (SPARK-2044), it is feasible to have several > implementations of shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org