supermem613 opened a new issue, #5438: URL: https://github.com/apache/incubator-gluten/issues/5438
### Description When using Gluten with Velox and Spark, today we specify the off-heap memory size and accordingly adjust the on-heap memory as well. In practice, this means that the amount of memory we set aside for on-heap cannot be used for off-heap and vice-versa, which can lead to situations where we are not optimally using the machine's memory since we may be doing processing mostly using on-heap or off-heap memory, but rarely both at the same time in great quantities. This is particularly painful, for example, when we fall back execution to "vanilla" Spark. For example, for a 64GB machine where we want to use 56GB of memory for Spark, we would set on-heap memory (via the spark.executor.memory setting) to, say, 14GB and set the off-heap (via the spark.memory.offHeap.size) to 42GB. In this case, if we fallback execution to Spark, we will be constrained by the 14GB of on-heap memory. If we don't fall back, we are using up to 42GB, leaving a fair number of unused GBs of memory that could be used. We propose to leverage the existing off-heap allocation tracking in Gluten, paired with JDK APIs (Runtime.getRuntime().totalMemory() and freeMemory() APIs) that show on-heap utilization to provide unified memory managed utilization control. However, it is important to notice that this approach does not actively control Java allocations, so it can in practice allow some over subscription of memory to happen until a native allocation comes along and is failed accordingly. From a configuration perspective, there will be a new gluten Boolean configuration to turn on this new feature, which in turn obviates any off-heap configuration. This means that the setting for off-heap enabling and sizing will no longer be used. Instead, we will continue to configure the executor memory – the on-heap sizing – to use as much memory as possible, as is done today with "vanilla" Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
