yma11 commented on issue #5884: URL: https://github.com/apache/incubator-gluten/issues/5884#issuecomment-2153715084
@zhli1142015 @FelixYBW @zhouyuan @zhztheplayer The code changes are available in following PRs: [Spark](https://github.com/yma11/spark/pull/4/files), [Gluten](https://github.com/yma11/gluten/pull/2), [Velox](https://github.com/yma11/velox/pull/1), please take a review. Next step I will test it in E2E and add some docs for it. Here are some explanations about code change: 1) New files in `shims/common`: Existing memory allocator listeners such as `ManagedAllocationListener` are under package `gluten-data` and native JNIs are under `backends-velox`, but because of I need to call these classes/APIs in the injects, so I put them in `shims/common`. 2) Late initialization of file cache: We use `GlutenMemStoreInjects` to get the conf of cache and then do initialization after Velox backend initialized which assures the native libs are loaded. 3) Cache size setting: we need to pass a cache size when `setAsyncDataCache`, using the default `int64_t max` will cause a `std::bad_alloc`. But the size is sensitive since in Velox, data cache will use this value to control the memory allocation. If it is too small, allocation failure will happen at native side even Spark doesn't report it at java side. As We leverage Spark memory manager to control the memory logic, we'd resolve this confliction by giving a large fake size for AsyncDataCache, maybe same as offheap size. 4) SSD cache can't work well in my test as the file cache entry is easily larger than `8M` and will cause check failure. [Issue](https://github.com/facebookincubator/velox/issues/10098) is reported for tracking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
