Re: [PR] feat: Make shuffle compression configurable and respect `spark.shuffle.compress` [datafusion-comet]

via GitHub Thu, 19 Dec 2024 14:38:10 -0800


andygrove commented on code in PR #1185:
URL: https://github.com/apache/datafusion-comet/pull/1185#discussion_r1893224341



##########
docs/source/user-guide/tuning.md:
##########
@@ -103,6 +103,12 @@ native shuffle currently only supports `HashPartitioning` 
and `SinglePartitionin
 To enable native shuffle, set `spark.comet.exec.shuffle.mode` to `native`. If 
this mode is explicitly set,
 then any shuffle operations that cannot be supported in this mode will fall 
back to Spark.
 
+### Shuffle Compression
+
+By default, Spark compresses shuffle files using LZ4 compression. Comet 
overrides this behavior with ZSTD compression.
+Compression can be disabled by setting `spark.shuffle.compress=false`, which 
may result in faster shuffle times in 
+certain environments, such as single-node setups with fast NVMe drives, at the 
expense of increased disk space usage.

Review Comment:
   We don't support LZ4 natively yet (there is a separate PR where I am working 
on adding this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Make shuffle compression configurable and respect `spark.shuffle.compress` [datafusion-comet]

Reply via email to