This is an automated email from the ASF dual-hosted git repository.
philo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new 26c51e5919 [VL][DOC] Move irrelevant content from VeloxGlutenUI.md
(#8786)
26c51e5919 is described below
commit 26c51e5919b0e9117ad6138e7384e79d15d0160f
Author: PHILO-HE <[email protected]>
AuthorDate: Thu Feb 20 15:38:33 2025 +0800
[VL][DOC] Move irrelevant content from VeloxGlutenUI.md (#8786)
---
docs/Configuration.md | 6 +-
docs/get-started/Velox.md | 31 ++++++++-
docs/get-started/VeloxGlutenUI.md | 78 ----------------------
.../org/apache/gluten/config/GlutenConfig.scala | 3 +-
4 files changed, 34 insertions(+), 84 deletions(-)
diff --git a/docs/Configuration.md b/docs/Configuration.md
index d577f9de24..86ee06bfb8 100644
--- a/docs/Configuration.md
+++ b/docs/Configuration.md
@@ -21,7 +21,7 @@ You can add these configurations into spark-defaults.conf to
enable or disable t
| spark.sql.join.preferSortMergeJoin | When true,
prefer sort merge join over shuffled hash join. <br /> Note: Please turn off
preferSortMergeJoin.
[...]
| spark.plugins | To load
Gluten's components by Spark's plug-in loader
[...]
| spark.shuffle.manager | To turn on
Gluten Columnar Shuffle Plugin
[...]
-| spark.gluten.enabled | Enable Gluten
at runtime, default is true. It fallbacks to vanilla Spark for all query plans
if set to false. Recommend to enable/disable Gluten through the setting for
`spark.plugins`.
[...]
+| spark.gluten.enabled | Enable Gluten
at runtime, default is true. Fall back to vanilla Spark for all query plans if
set to false. Recommend to enable/disable Gluten through the setting for
`spark.plugins`.
[...]
| spark.gluten.memory.isolation |
(Experimental) Enable isolated memory mode. If true, Gluten controls the
maximum off-heap memory can be used by each task to X, X = executor memory /
max task slots. It's recommended to set true if Gluten serves concurrent
queries within a single session, since not all memory Gluten allocated is
guaranteed to be spillable. In the case, the feature should be enabled to avoid
OOM. Note when true, setting spark.memory.storageF [...]
| spark.gluten.ras.enabled | Enables RAS
(relation algebra selector) during physical planning to generate more efficient
query plan. Note, this feature doesn't bring performance profits by default.
Try exploring option `spark.gluten.ras.costModel` for advanced usage.
[...]
| spark.gluten.sql.columnar.maxBatchSize | Number of
rows to be processed in each batch. Default value is 4096.
[...]
@@ -72,7 +72,7 @@ You can add these configurations into spark-defaults.conf to
enable or disable t
| spark.gluten.sql.cartesianProductTransformerEnabled | Config to
enable CartesianProductExecTransformer.
[...]
| spark.gluten.sql.broadcastNestedLoopJoinTransformerEnabled | Config to
enable BroadcastNestedLoopJoinExecTransformer.
[...]
| spark.gluten.sql.cacheWholeStageTransformerContext | When true,
`WholeStageTransformer` will cache the `WholeStageTransformerContext` when
executing. It is used to get substrait plan node and native plan string.
[...]
-| spark.gluten.sql.injectNativePlanStringToExplain | When true,
Gluten will inject native plan tree to explain string inside
`WholeStageTransformerContext`.
[...]
+| spark.gluten.sql.injectNativePlanStringToExplain | When true,
Gluten will inject native plan tree to Spark's explain output.
[...]
| spark.gluten.sql.fallbackRegexpExpressions | When true,
Gluten will fall back all regexp expressions to avoid any incompatibility risk.
[...]
## Velox Parameters
@@ -89,7 +89,7 @@ The following configurations are related to Velox settings.
| spark.gluten.sql.columnar.backend.velox.filePreloadThreshold | Set
the file preload threshold for velox file scan.
|
|
| spark.gluten.sql.columnar.backend.velox.prefetchRowGroups | Set
the prefetch row groups for velox file scan.
|
|
| spark.gluten.sql.columnar.backend.velox.loadQuantum | Set
the load quantum for velox file scan.
|
|
-| spark.gluten.sql.columnar.backend.velox.maxCoalescedDistance | Set
the max coalesced distance for velox file scan.
| |
+| spark.gluten.sql.columnar.backend.velox.maxCoalescedDistance | Set
the max coalesced distance for velox file scan.
|
|
| spark.gluten.sql.columnar.backend.velox.maxCoalescedBytes | Set
the max coalesced bytes for velox file scan.
|
|
| spark.gluten.sql.columnar.backend.velox.cachePrefetchMinPct | Set
prefetch cache min pct for velox file scan.
|
|
| spark.gluten.velox.awsSdkLogLevel | Log
granularity of AWS C++ SDK in velox.
| FATAL
|
diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index 1989922a93..e649709278 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -502,8 +502,9 @@ Both Parquet and ORC datasets are sf1024.
Please refer [Gluten UI](VeloxGlutenUI.md)
-# Gluten Implicits
+# Gluten Native Plan Summary
+## Gluten Implicits
Gluten provides a helper class to get the fallback summary from a Spark
Dataset.
```
@@ -516,6 +517,34 @@ Note that, if AQE is enabled, but the query is not
materialized, then it will re
the query execution with disabled AQE. It is a workaround to get the final
plan, and it may
cause the inconsistent results with a materialized query. However, we have no
choice.
+## Native Plan in Spark's Explain Output
+
+Gluten supports inject native plan string into Spark explain with formatted
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
+Here is an example, how Gluten shows the native plan string.
+
+```
+(9) WholeStageCodegenTransformer (2)
+Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
+Arguments: false
+Native Plan:
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"),
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"),
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT,
n3_10:BIGINT, n3_11:BIGINT
+ -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT,
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+ -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull,
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
+ -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+```
+
+## Native Plan with Stats
+
+Gluten supports print native plan with stats to executor system output stream
by setting `--conf spark.gluten.sql.debug=true`.
+Note that, the plan string with stats is task level which may cause executor
log size big. Here is an example, how Gluten show the native plan string with
stats.
+
+```
+I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native
Plan with stats for: [Stage: 1 TID: 16]
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"),
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"),
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT,
n3_10:BIGINT, n3_11:BIGINT
+ Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time:
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
+ queuedWallNanos sum: 2.00us, count: 1, min: 2.00us, max:
2.00us
+```
+
# Accelerators
Please refer [HBM](VeloxHBM.md) [QAT](VeloxQAT.md) [IAA](VeloxIAA.md) for
details
diff --git a/docs/get-started/VeloxGlutenUI.md
b/docs/get-started/VeloxGlutenUI.md
index 6f40e25b9e..53453af352 100644
--- a/docs/get-started/VeloxGlutenUI.md
+++ b/docs/get-started/VeloxGlutenUI.md
@@ -36,81 +36,3 @@ If you want to disable Gluten UI, add a config when
submitting `--conf spark.glu
## History server
Gluten UI also supports Spark history server. Add gluten-ui jar into the
history server classpath, e.g., $SPARK_HOME/jars, then restart history server.
-
-## Native plan string
-
-Gluten supports inject native plan string into Spark explain with formatted
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
-Here is an example, how Gluten show the native plan string.
-
-```
-(9) WholeStageCodegenTransformer (2)
-Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
-Arguments: false
-Native Plan:
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"),
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"),
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT,
n3_10:BIGINT, n3_11:BIGINT
- -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT,
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
- -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull,
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
- -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-```
-
-## Native plan with stats
-
-Gluten supports print native plan with stats to executor system output stream
by setting `--conf spark.gluten.sql.debug=true`.
-Note that, the plan string with stats is task level which may cause executor
log size big. Here is an example, how Gluten show the native plan string with
stats.
-
-```
-I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native
Plan with stats for: [Stage: 1 TID: 16]
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"),
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"),
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT,
n3_10:BIGINT, n3_11:BIGINT
- Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time:
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
- queuedWallNanos sum: 2.00us, count: 1, min: 2.00us, max:
2.00us
- runningAddInputWallNanos sum: 626ns, count: 1, min: 626ns, max: 626ns
- runningFinishWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
- runningGetOutputWallNanos sum: 5.54us, count: 1, min: 5.54us, max:
5.54us
- -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT,
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
- Output: 27 rows (3.56KB, 3 batches), Cpu time: 223.00us, Blocked wall
time: 0ns, Peak memory: 93.12KB, Memory allocations: 15
- HashBuild: Input: 10 rows (960B, 10 batches), Output: 0 rows (0B, 0
batches), Cpu time: 185.67us, Blocked wall time: 0ns, Peak memory: 68.00KB,
Memory allocations: 2, Threads: 1
- distinctKey0 sum: 4, count: 1, min: 4, max: 4
- hashtable.capacity sum: 4, count: 1, min: 4, max: 4
- hashtable.numDistinct sum: 10, count: 1, min: 10, max: 10
- hashtable.numRehashes sum: 1, count: 1, min: 1, max: 1
- queuedWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
- rangeKey0 sum: 4, count: 1, min: 4, max: 4
- runningAddInputWallNanos sum: 1.27ms, count: 1, min: 1.27ms, max:
1.27ms
- runningFinishWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
- runningGetOutputWallNanos sum: 1.29us, count: 1, min: 1.29us, max:
1.29us
- H23/11/21 10:19:42 INFO TaskSetManager: Finished task 3.0 in stage 1.0
(TID 13) in 335 ms on 10.221.97.35 (executor driver) (1/10)
-ashProbe: Input: 9 rows (864B, 3 batches), Output: 27 rows (3.56KB, 3
batches), Cpu time: 37.33us, Blocked wall time: 0ns, Peak memory: 25.12KB,
Memory allocations: 13, Threads: 1
- dynamicFiltersProduced sum: 1, count: 1, min: 1, max: 1
- queuedWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
- runningAddInputWallNanos sum: 4.54us, count: 1, min: 4.54us, max:
4.54us
- runningFinishWallNanos sum: 83ns, count: 1, min: 83ns, max: 83ns
- runningGetOutputWallNanos sum: 29.08us, count: 1, min: 29.08us,
max: 29.08us
- -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull,
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
- Input: 9 rows (864B, 3 batches), Output: 9 rows (864B, 3 batches), Cpu
time: 630.75us, Blocked wall time: 0ns, Peak memory: 2.44KB, Memory
allocations: 63, Threads: 1, Splits: 3
- dataSourceWallNanos sum: 102.00us, count: 1, min:
102.00us, max: 102.00us
- dynamicFiltersAccepted sum: 1, count: 1, min: 1, max: 1
- flattenStringDictionaryValues sum: 0, count: 1, min: 0, max: 0
- ioWaitNanos sum: 312.00us, count: 1, min:
312.00us, max: 312.00us
- localReadBytes sum: 0B, count: 1, min: 0B, max: 0B
- numLocalRead sum: 0, count: 1, min: 0, max: 0
- numPrefetch sum: 0, count: 1, min: 0, max: 0
- numRamRead sum: 0, count: 1, min: 0, max: 0
- numStorageRead sum: 6, count: 1, min: 6, max: 6
- overreadBytes sum: 0B, count: 1, min: 0B, max: 0B
- prefetchBytes sum: 0B, count: 1, min: 0B, max: 0B
- queryThreadIoLatency sum: 12, count: 1, min: 12, max: 12
- ramReadBytes sum: 0B, count: 1, min: 0B, max: 0B
- runningAddInputWallNanos sum: 0ns, count: 1, min: 0ns, max:
0ns
- runningFinishWallNanos sum: 125ns, count: 1, min: 125ns,
max: 125ns
- runningGetOutputWallNanos sum: 1.07ms, count: 1, min: 1.07ms,
max: 1.07ms
- skippedSplitBytes sum: 0B, count: 1, min: 0B, max: 0B
- skippedSplits sum: 0, count: 1, min: 0, max: 0
- skippedStrides sum: 0, count: 1, min: 0, max: 0
- storageReadBytes sum: 3.44KB, count: 1, min: 3.44KB,
max: 3.44KB
- totalScanTime sum: 0ns, count: 1, min: 0ns, max:
0ns
- -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
- Input: 0 rows (0B, 0 batches), Output: 10 rows (960B, 10 batches), Cpu
time: 1.03ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0,
Threads: 1
- runningAddInputWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
- runningFinishWallNanos sum: 54.62us, count: 1, min: 54.62us,
max: 54.62us
- runningGetOutputWallNanos sum: 1.10ms, count: 1, min: 1.10ms,
max: 1.10ms
-```
diff --git
a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
index 3b86959631..31d397d7ba 100644
--- a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
+++ b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
@@ -1563,8 +1563,7 @@ object GlutenConfig {
val INJECT_NATIVE_PLAN_STRING_TO_EXPLAIN =
buildConf("spark.gluten.sql.injectNativePlanStringToExplain")
.internal()
- .doc("When true, Gluten will inject native plan tree to explain string
inside " +
- "`WholeStageTransformerContext`.")
+ .doc("When true, Gluten will inject native plan tree to Spark's explain
output.")
.booleanConf
.createWithDefault(false)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]