This is an automated email from the ASF dual-hosted git repository.
yuanzhou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new 6eb0c41ea3 [GLUTEN-10915][VL] Fix dynamic offheap sizing feature by
setting default offheap size (#10916)
6eb0c41ea3 is described below
commit 6eb0c41ea37e797341a254e6e0c8ad3ffeb1a37f
Author: Yuan <[email protected]>
AuthorDate: Thu Oct 23 08:17:35 2025 +0100
[GLUTEN-10915][VL] Fix dynamic offheap sizing feature by setting default
offheap size (#10916)
This patch set the default offheap size(0) for dynamic offheap sizing
feature. As Gluten will look for these settings internally
---------
Signed-off-by: Yuan <[email protected]>
---
.../execution/DynamicOffHeapSizingSuite.scala | 36 ++++++++++++++++++++--
docs/developers/VeloxDynamicSizingOffheap.md | 11 +++++--
.../scala/org/apache/gluten/GlutenPlugin.scala | 14 +++++++++
3 files changed, 57 insertions(+), 4 deletions(-)
diff --git
a/backends-velox/src/test/scala/org/apache/gluten/execution/DynamicOffHeapSizingSuite.scala
b/backends-velox/src/test/scala/org/apache/gluten/execution/DynamicOffHeapSizingSuite.scala
index aa71c867b2..0afbc2fa19 100644
---
a/backends-velox/src/test/scala/org/apache/gluten/execution/DynamicOffHeapSizingSuite.scala
+++
b/backends-velox/src/test/scala/org/apache/gluten/execution/DynamicOffHeapSizingSuite.scala
@@ -35,12 +35,11 @@ class DynamicOffHeapSizingSuite extends
VeloxWholeStageTransformerSuite {
.set("spark.shuffle.manager",
"org.apache.spark.shuffle.sort.ColumnarShuffleManager")
.set("spark.executor.memory", "2GB")
.set("spark.memory.offHeap.enabled", "false")
- .set("spark.memory.offHeap.size", "0")
.set(GlutenCoreConfig.DYNAMIC_OFFHEAP_SIZING_MEMORY_FRACTION.key, "0.95")
.set(GlutenCoreConfig.DYNAMIC_OFFHEAP_SIZING_ENABLED.key, "true")
}
- test("Dynamic off-heap sizing") {
+ test("Dynamic off-heap sizing without setting offheap") {
if (DynamicOffHeapSizingMemoryTarget.isJava9OrLater()) {
val query =
"""
@@ -70,4 +69,37 @@ class DynamicOffHeapSizingSuite extends
VeloxWholeStageTransformerSuite {
runAndCompare(query)
}
}
+
+ test("Dynamic off-heap sizing with setting offheap") {
+ withSQLConf(GlutenCoreConfig.SPARK_OFFHEAP_SIZE_KEY -> "1GB") {
+ if (DynamicOffHeapSizingMemoryTarget.isJava9OrLater()) {
+ val query =
+ """
+ | select l_quantity, c_acctbal, o_orderdate, p_type, n_name,
s_suppkey
+ | from customer, orders, lineitem, part, supplier, nation
+ | where c_custkey = o_custkey and o_orderkey = l_orderkey and
l_partkey = p_partkey
+ | and l_suppkey = s_suppkey and s_nationkey = n_nationkey
+ | order by c_acctbal desc, o_orderdate, s_suppkey, n_name, p_type,
l_quantity
+ | limit 1
+ """.stripMargin
+ var totalMemory = Runtime.getRuntime().totalMemory()
+ var freeMemory = Runtime.getRuntime().freeMemory()
+ // Ensure that the JVM memory is not too small to trigger dynamic
off-heap sizing.
+ while
(!DynamicOffHeapSizingMemoryTarget.canShrinkJVMMemory(totalMemory, freeMemory))
{
+ withSQLConf(("spark.gluten.enabled", "false")) {
+ spark.sql(query).collect()
+ }
+ totalMemory = Runtime.getRuntime().totalMemory()
+ freeMemory = Runtime.getRuntime().freeMemory()
+ }
+ val newTotalMemory =
+ DynamicOffHeapSizingMemoryTarget.shrinkOnHeapMemory(totalMemory,
freeMemory, false)
+ assert(DynamicOffHeapSizingMemoryTarget.getTotalExplicitGCCount() > 0)
+ // Verify that the total memory is reduced after shrink.
+ assert(newTotalMemory < totalMemory)
+ // Verify that the query can run with dynamic off-heap sizing enabled.
+ runAndCompare(query)
+ }
+ }
+ }
}
diff --git a/docs/developers/VeloxDynamicSizingOffheap.md
b/docs/developers/VeloxDynamicSizingOffheap.md
index e6ff3b399d..5f195ead45 100644
--- a/docs/developers/VeloxDynamicSizingOffheap.md
+++ b/docs/developers/VeloxDynamicSizingOffheap.md
@@ -8,13 +8,19 @@ parent: Developer Overview
## Dynamic Off-heap Sizing
Gluten requires setting both on-heap and off-heap memory sizes, which
initializes different memory layouts. Improper configuration of these settings
can lead to lower performance.
-To fix this issue, dynamic off-heap sizing is an experimental feature designed
to simplify this process. When enabled, off-heap settings are ignored, and
Velox uses the on-heap size as the memory size.
+To fix this issue, dynamic off-heap sizing is an experimental feature designed
to simplify this process. Please note when enabled, user defined spark off-heap
settings(`spark.memory.offHeap.enabled`, `spark.memory.offHeap.size`) will not
be effective, and Velox uses the on-heap size as the memory size.
+To enable this feature, users need to add below entry in Spark conf:
+```
+--conf spark.gluten.memory.dynamic.offHeap.sizing.enabled=true
+```
+
## Detail implementations
To understand the details, it's essential to learn the basics of JVM memory
management. There are many resources discussing JVM internals:
- https://exia.dev/blog/2019-12-10/JVM-Memory-Model/
-
https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html
- https://www.scaler.com/topics/memory-management-in-java/
-
https://developers.redhat.com/articles/2021/09/09/how-jvm-uses-and-allocates-memory#
+-
https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html
In general, the feature works as follows:
@@ -23,10 +29,11 @@ In general, the feature works as follows:
- If there is sufficient memory, allocations proceed normally.
- If memory is insufficient, Spark performs garbage collection (GC) to free
on-heap memory, allowing Velox to allocate memory.
- If memory remains insufficient after GC, Spark reports an out-of-memory
(OOM) issue.
+- The `MaxHeapFreeRatio` and `MinHeapFreeRatio` parameters are used to
configure the max/min heap size for Spark JVM process. Note these two
parameters are avaiable starts from JDK-11.
We then enforce a total memory quota, calculated as the sum of committed and
in-use memory in the Java heap (using `Runtime.getRuntime().totalMemory()`)
plus tracked off-heap memory in `TreeMemoryConsumer`. If an allocation exceeds
this total committed memory, the allocation fails and triggers an OOM.
-With this change, the "quota check" is performed when Gluten receives an
memory allocation request. In practice, this means the Java codebase can
oversubscribe memory within the on-heap quota, even if off-heap usage is
sufficient to fail the allocation.
+With this change, the "quota check" is performed when Gluten receives a memory
allocation request. In practice, this means the Java codebase can oversubscribe
memory within the on-heap quota, even if off-heap usage is sufficient to fail
the allocation.
## Limitations
diff --git a/gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala
b/gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala
index 4af5880afb..e508570eeb 100644
--- a/gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala
+++ b/gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala
@@ -118,6 +118,20 @@ private object GlutenDriverPlugin extends Logging {
// 1GB default
1024 * 1024 * 1024
}
+
+ if (conf.contains(GlutenCoreConfig.SPARK_OFFHEAP_ENABLED_KEY)) {
+ logWarning(
+ s"Dynamic off-heap sizing is enabled. Ignoring user-defined " +
+ s"'${GlutenCoreConfig.SPARK_OFFHEAP_SIZE_KEY}' setting.")
+ }
+ if (conf.contains(GlutenCoreConfig.SPARK_OFFHEAP_SIZE_KEY)) {
+ logWarning(
+ s"Dynamic off-heap sizing is enabled. Ignoring user-defined " +
+ s"'${GlutenCoreConfig.SPARK_OFFHEAP_SIZE_KEY}' setting.")
+ }
+ conf.set(GlutenCoreConfig.SPARK_OFFHEAP_SIZE_KEY, "0")
+ conf.set(GlutenCoreConfig.SPARK_OFFHEAP_ENABLED_KEY, "false")
+
((onHeapSize - (300 * 1024 * 1024)) *
conf.getDouble(GlutenCoreConfig.DYNAMIC_OFFHEAP_SIZING_MEMORY_FRACTION.key,
0.6d)).toLong
} else {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]