yma11 opened a new issue, #5884: URL: https://github.com/apache/incubator-gluten/issues/5884
### Description Velox backend provides 2-level file cache (`AsyncDataCache` and `SsdCache` and we have enabled it in [PR](https://github.com/apache/incubator-gluten/pull/1076/files), using a dedicated `MMapAllocator` initialized with configured capacity. This part of memory is not counted by execution memory or storage memory, and not managed by Spark `UnifiedMemoryManager`. In this ticket, we would like to fill this gap by following designs: - Add `NativeStorageMemory` segment in vanilla `StorageMemory`. We will have a configuration `spark.memory.native.storageFraction` to define its size. Then we use this size `offheap.memory*spark.memory.storageFraction*spark.memory.native.storageFraction` to initialize `AsyncDataCache`. - -Add configuration `spark.memory.storage.preferSpillNative` to determine preference of spilling RDD cache or FileCache(Native) when storage memory should be shrinked. For example, when queries are mostly executed on same data sources, we prefer to keep native file cache. - Update vanilla storage memory pool size everywhere it's used by collecting stats of `NativeStorageMemory`. - Implement/update `AsyncDataCache::usedBytes()`/`AsyncDataCache::shrink()`/`AsyncDataCache::findOrCreate`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
