andygrove opened a new issue, #3198: URL: https://github.com/apache/datafusion-comet/issues/3198
### Describe the bug Using the PySpark benchmark in the repo, I am comparing logging and metrics for JVM vs native shuffle. JVM shuffle spills 96 times: ``` 26/01/15 13:48:46 INFO CometShuffleExternalSorter: Thread 98 spilling sort data of 512.0 MiB to disk (1 time so far) 26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 82 spilling sort data of 512.0 MiB to disk (2 times so far) 26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 95 spilling sort data of 512.0 MiB to disk (2 times so far) 26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 104 spilling sort data of 512.0 MiB to disk (2 times so far) 26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 106 spilling sort data of 512.0 MiB to disk (2 times so far) ... ``` Native shuffle spills 32 times: ``` 26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532719016 to disk while inserting (0 time(s) so far) 26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532094760 to disk while inserting (0 time(s) so far) 26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting (0 time(s) so far) 26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting (0 time(s) so far) 26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532719208 to disk while inserting (0 time(s) so far) ... ``` ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
