yma11 commented on issue #5884: URL: https://github.com/apache/incubator-gluten/issues/5884#issuecomment-2191679510
Some updates: - Based on previous implementation that we added a listener against `MmapMemoryAllocator` to trigger the `acquireStoargeMemory`/`releaseStorageMemory` by binding with `freeNonContiguous`/`allocateNonContiguousWithoutRetry`, there is a problem can't be resolved because of these 2 methods are not always called when memory usage changes in AsyncDataCache. For example, when do a `shrink(bytes)`, the released size is not always equal to the bytes returned by `freeNonContiguous`, like this log `freed size by freeAllocations: 1753088 largeEvicted is: 4374528 tinyEvicted is:0`. Velox memory management introduced several regions, like small data, pages data, etc. So if we use returned shrinked size to decrease the storage memory pool, while using `freeNonContiguous`/`allocateNonContiguousWithoutRetry` to change `memoryUsed` value, there will eventually be a mismatch. - Instead, we now switch to add a listener against `AsyncDataCache` itself. When `shrink` happens, we will release the amount of storage memory. when a new entry created, we will acquire corresponding size memory. But may return some back as it may not actually increase such amount of bytes if cache shrink happens during a new cache entry creation. It also has a advantage that no special change needed at velox side with only the "findOrCreate()" method changes to virtual. We have created [PR](https://github.com/apache/spark/pull/47067) for upstream Spark. For velox, there is only one line change: <img width="490" alt="image" src="https://github.com/apache/incubator-gluten/assets/11849056/5671f423-49ff-4b24-8017-45ecdb34a326"> Latest code change for Gluten is tracked by [PR](https://github.com/apache/incubator-gluten/pull/6239). @zhli1142015 would like to try this implementation in your workload? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
