kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2004351049
########## docs/source/user-guide/tuning.md: ########## @@ -141,30 +191,22 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by setting `spark.comet.exec.shuffle.enabled` to `true` or `false`. Once it is disabled, Comet will fall back to the default Spark shuffle manager. -### Shuffle Mode +### Shuffle Implementations -Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and Auto Mode. +Comet provides two shuffle implementations: Native Shuffle and Columnar Shuffle. Comet will first try to use Native +Shuffle and if that is not possible it will try to use Columnar Shuffle. If neither can be applied, it will fall +back to Spark for shuffle operations. Review Comment: It would be helpful to say this is the default auto mode. The new explanation does not have `auto` keyword. ########## docs/source/user-guide/tuning.md: ########## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuning Guide +# Comet Tuning Guide Comet provides some tuning options to help you get the best performance from your queries. ## Memory Tuning -### Unified Memory Management with Off-Heap Memory +It is necessary to specify how much memory Comet can use in addition to memory already allocated to Spark. In some +cases, it may be possible to reduce the amount of memory allocated to Spark so that overall memory allocation is +the same or lower than the original configuration. In other cases, enabling Comet may require allocating more memory +than before. See the [Determining How Much Memory to Allocate] section for more details. -The recommended way to share memory between Spark and Comet is to set `spark.memory.offHeap.enabled=true`. This allows -Comet to share an off-heap memory pool with Spark. The size of the pool is specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to Spark documentation: https://spark.apache.org/docs/latest/configuration.html. +[Determining How Much Memory to Allocate]: #determining-how-much-memory-to-allocate -The type of pool can be specified with `spark.comet.exec.memoryPool`. +Comet supports Spark's on-heap (the default) and off-heap mode for allocating memory. However, we strongly recommend +using off-heap mode. Comet has some limitations when running in on-heap mode, such as requiring more memory overall, +and requiring shuffle memory to be separately configured. + +### Configuring Comet Memory in Off-Heap Mode + +The recommended way to allocate memory for Comet is to set `spark.memory.offHeap.enabled=true`. This allows +Comet to share an off-heap memory pool with Spark, reducing the overall memory overhead. The size of the pool is +specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to +Spark documentation: https://spark.apache.org/docs/latest/configuration.html. Review Comment: Thanks, the memory limit for `fair_unified` is defined in a same way for the on-heap `fair_spill_global`. I thought `spark.memory.offHeap.size` is just the amount that Spark recognize, and Spark needs some memory for off heap. I am thinking `spark.comet.memoryOverhead` more for avoiding to use all off-heap memory and leaving some off heap memory for Spark... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org