yma11 commented on issue #5884:
URL: 
https://github.com/apache/incubator-gluten/issues/5884#issuecomment-2191679510

   Some updates:
   
   - Based on previous implementation that we added a listener against 
`MmapMemoryAllocator` to trigger the 
`acquireStoargeMemory`/`releaseStorageMemory` by binding with 
`freeNonContiguous`/`allocateNonContiguousWithoutRetry`, there is a problem 
can't be resolved because of these 2 methods are not always called when memory 
usage changes in AsyncDataCache. For example, when do a `shrink(bytes)`, the 
released size is not always equal to the bytes returned by `freeNonContiguous`, 
like this log `freed size by freeAllocations: 1753088 largeEvicted is: 4374528 
tinyEvicted is:0`. Velox memory management  introduced several regions, like 
small data, pages data, etc. So if we use returned shrinked size to decrease 
the storage memory pool, while using 
`freeNonContiguous`/`allocateNonContiguousWithoutRetry` to change `memoryUsed` 
value, there will eventually be a mismatch.
   - Instead, we now switch to add a listener against `AsyncDataCache` itself. 
When `shrink` happens, we will release the amount of storage memory. when a new 
entry created, we will acquire corresponding size memory. But may return some 
back as it may not actually increase such amount of bytes if cache shrink 
happens during a new cache entry creation. It also has a advantage that no 
special change needed at velox side with only the "findOrCreate()" method 
changes to virtual.
   
   We have created [PR](https://github.com/apache/spark/pull/47067) for 
upstream Spark. 
   For velox, there is only one line change:
   <img width="490" alt="image" 
src="https://github.com/apache/incubator-gluten/assets/11849056/5671f423-49ff-4b24-8017-45ecdb34a326";>
   Latest code change for Gluten is tracked by 
[PR](https://github.com/apache/incubator-gluten/pull/6239).
   @zhli1142015 would like to try this implementation in your workload?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to