andygrove opened a new issue, #3198:
URL: https://github.com/apache/datafusion-comet/issues/3198

   ### Describe the bug
   
   Using the PySpark benchmark in the repo, I am comparing logging and metrics 
for JVM vs native shuffle.
   
   JVM shuffle spills 96 times:
   
   ```
   26/01/15 13:48:46 INFO CometShuffleExternalSorter: Thread 98 spilling sort 
data of 512.0 MiB to disk (1  time so far)
   26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 82 spilling sort 
data of 512.0 MiB to disk (2  times so far)
   26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 95 spilling sort 
data of 512.0 MiB to disk (2  times so far)
   26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 104 spilling sort 
data of 512.0 MiB to disk (2  times so far)
   26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 106 spilling sort 
data of 512.0 MiB to disk (2  times so far)
   ...
   ```
   
   Native shuffle spills 32 times:
   
   ```
   26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: 
ShuffleRepartitioner spilling shuffle data of 532719016 to disk while inserting 
(0 time(s) so far)
   26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: 
ShuffleRepartitioner spilling shuffle data of 532094760 to disk while inserting 
(0 time(s) so far)
   26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: 
ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting 
(0 time(s) so far)
   26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: 
ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting 
(0 time(s) so far)
   26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: 
ShuffleRepartitioner spilling shuffle data of 532719208 to disk while inserting 
(0 time(s) so far)
   ...
   ```
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to