EmilyMatt opened a new issue, #1659: URL: https://github.com/apache/datafusion-comet/issues/1659
This is a nitpick, but I've noticed that in the new interleaved shuffle, when copying the data into the output data file, first the in-memory data is written to the file, and only then is the copy performed, while this is done in a shuffle, and therefor the block order is not guaranteed in the read stage, it still removes the partial ordering within the block, this can be easily remedied by moving the write to after the copy is done, without performance penalties or anything, then if the data was ordered before the shuffle, the block will be ordered as well. I believe this is also more inline with Spark's shuffle behaviour, where the first batches received are first to be written. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org