Kontinuation commented on issue #886:
URL: 
https://github.com/apache/datafusion-comet/issues/886#issuecomment-2317215222

   
[`CometShuffleMemoryAllocator`](https://github.com/apache/datafusion-comet/blob/0.2.0/spark/src/main/java/org/apache/spark/shuffle/comet/CometShuffleMemoryAllocator.java)
 allocates at most `spark.comet.shuffle.memory.factor` * 
`spark.comet.memoryOverhead` bytes of memory for all comet external sorters. 
Usually, the number of concurrently running external sorters is the number of 
executor cores. In this case, 1.2 * 0.7 GB shuffle memory is shared by all 6 
cores.
   
   While for other native comet operators, a dedicated GreedyMemoryPool sized 
`spark.comet.memoryOverhead` is created for each operator (assuming we are not 
using the unified memory manager introduced by 
https://github.com/apache/datafusion-comet/pull/83), the comet shuffle memory 
amortized to each core is too small compared to other operators, unless we 
configure `spark.comet.columnar.shuffle.memorySize` additionally.
   
   `CometShuffleMemoryAllocator` is a singleton, so all comet external sorters 
allocate from a shared memory pool. The comet external sorters can only spill 
themselves when allocation fails, and the `CometShuffleMemoryAllocator` does 
not support making other memory consumers spill to free up memory for the 
requesting memory consumer. If a comet external sorter is using just a tiny 
amount of memory and fails an allocation, it won't be able to do anything other 
than throw a `SparkOutOfMemoryError` exception. Is it feasible to support 
creating dedicated `CometShuffleMemoryAllocator` for each shuffle writer, since 
it is a safer choice when operators can only self-spill?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to