[I] [VL] Support file cache spill in Gluten [incubator-gluten]

via GitHub Mon, 27 May 2024 06:28:18 -0700


yma11 opened a new issue, #5884:
URL: https://github.com/apache/incubator-gluten/issues/5884


   ### Description
   
   Velox backend provides 2-level file cache (`AsyncDataCache` and `SsdCache` 
and we have enabled it in 
[PR](https://github.com/apache/incubator-gluten/pull/1076/files), using a 
dedicated `MMapAllocator` initialized with configured capacity. This part of 
memory is not counted by execution memory or storage memory, and not managed by 
Spark `UnifiedMemoryManager`. In this ticket, we would like to fill this gap by 
following designs:
   
   - Add `NativeStorageMemory` segment in vanilla `StorageMemory`. We will have 
a configuration `spark.memory.native.storageFraction` to define its size. Then 
we use this size 
`offheap.memory*spark.memory.storageFraction*spark.memory.native.storageFraction`
 to initialize `AsyncDataCache`.
   - -Add configuration `spark.memory.storage.preferSpillNative` to determine 
preference of spilling RDD cache or FileCache(Native) when storage memory 
should be shrinked. For example, when queries are mostly executed on same data 
sources, we prefer to keep native file cache.
   - Update vanilla storage memory pool size everywhere it's used by collecting 
stats of `NativeStorageMemory`.
   - Implement/update 
`AsyncDataCache::usedBytes()`/`AsyncDataCache::shrink()`/`AsyncDataCache::findOrCreate`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL] Support file cache spill in Gluten [incubator-gluten]

Reply via email to