Kontinuation commented on issue #886: URL: https://github.com/apache/datafusion-comet/issues/886#issuecomment-2317215222
[`CometShuffleMemoryAllocator`](https://github.com/apache/datafusion-comet/blob/0.2.0/spark/src/main/java/org/apache/spark/shuffle/comet/CometShuffleMemoryAllocator.java) allocates at most `spark.comet.shuffle.memory.factor` * `spark.comet.memoryOverhead` bytes of memory for all comet external sorters. Usually, the number of concurrently running external sorters is the number of executor cores. In this case, 1.2 * 0.7 GB shuffle memory is shared by all 6 cores. While for other native comet operators, a dedicated GreedyMemoryPool sized `spark.comet.memoryOverhead` is created for each operator (assuming we are not using the unified memory manager introduced by https://github.com/apache/datafusion-comet/pull/83), the comet shuffle memory amortized to each core is too small compared to other operators, unless we configure `spark.comet.columnar.shuffle.memorySize` additionally. `CometShuffleMemoryAllocator` is a singleton, so all comet external sorters allocate from a shared memory pool. The comet external sorters can only spill themselves when allocation fails, and the `CometShuffleMemoryAllocator` does not support making other memory consumers spill to free up memory for the requesting memory consumer. If a comet external sorter is using just a tiny amount of memory and fails an allocation, it won't be able to do anything other than throw a `SparkOutOfMemoryError` exception. Is it feasible to support creating dedicated `CometShuffleMemoryAllocator` for each shuffle writer, since it is a safer choice when operators can only self-spill? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
