(incubator-gluten) branch main updated: [VL][DOC] Move irrelevant content from VeloxGlutenUI.md (#8786)

philo Wed, 19 Feb 2025 23:51:14 -0800

This is an automated email from the ASF dual-hosted git repository.

philo pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new 26c51e5919 [VL][DOC] Move irrelevant content from VeloxGlutenUI.md 
(#8786)
26c51e5919 is described below

commit 26c51e5919b0e9117ad6138e7384e79d15d0160f
Author: PHILO-HE <[email protected]>
AuthorDate: Thu Feb 20 15:38:33 2025 +0800

    [VL][DOC] Move irrelevant content from VeloxGlutenUI.md (#8786)
---
 docs/Configuration.md                              |  6 +-
 docs/get-started/Velox.md                          | 31 ++++++++-
 docs/get-started/VeloxGlutenUI.md                  | 78 ----------------------
 .../org/apache/gluten/config/GlutenConfig.scala    |  3 +-
 4 files changed, 34 insertions(+), 84 deletions(-)

diff --git a/docs/Configuration.md b/docs/Configuration.md
index d577f9de24..86ee06bfb8 100644
--- a/docs/Configuration.md
+++ b/docs/Configuration.md
@@ -21,7 +21,7 @@ You can add these configurations into spark-defaults.conf to 
enable or disable t
 | spark.sql.join.preferSortMergeJoin                           | When true, 
prefer sort merge join over shuffled hash join. <br /> Note: Please turn off 
preferSortMergeJoin.                                                            
                                                                                
                                                                                
                                                                                
                    [...]
 | spark.plugins                                                | To load 
Gluten's components by Spark's plug-in loader                                   
                                                                                
                                                                                
                                                                                
                                                                                
                    [...]
 | spark.shuffle.manager                                        | To turn on 
Gluten Columnar Shuffle Plugin                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
-| spark.gluten.enabled                                         | Enable Gluten 
at runtime, default is true. It fallbacks to vanilla Spark for all query plans 
if set to false. Recommend to enable/disable Gluten through the setting for 
`spark.plugins`.                                                                
                                                                                
                                                                                
                   [...]
+| spark.gluten.enabled                                         | Enable Gluten 
at runtime, default is true. Fall back to vanilla Spark for all query plans if 
set to false. Recommend to enable/disable Gluten through the setting for 
`spark.plugins`.                                                                
                                                                                
                                                                                
                      [...]
 | spark.gluten.memory.isolation                                | 
(Experimental) Enable isolated memory mode. If true, Gluten controls the 
maximum off-heap memory can be used by each task to X, X = executor memory / 
max task slots. It's recommended to set true if Gluten serves concurrent 
queries within a single session, since not all memory Gluten allocated is 
guaranteed to be spillable. In the case, the feature should be enabled to avoid 
OOM. Note when true, setting spark.memory.storageF [...]
 | spark.gluten.ras.enabled                                     | Enables RAS 
(relation algebra selector) during physical planning to generate more efficient 
query plan. Note, this feature doesn't bring performance profits by default. 
Try exploring option `spark.gluten.ras.costModel` for advanced usage.           
                                                                                
                                                                                
                   [...]
 | spark.gluten.sql.columnar.maxBatchSize                       | Number of 
rows to be processed in each batch. Default value is 4096.                      
                                                                                
                                                                                
                                                                                
                                                                                
                  [...]
@@ -72,7 +72,7 @@ You can add these configurations into spark-defaults.conf to 
enable or disable t
 | spark.gluten.sql.cartesianProductTransformerEnabled          | Config to 
enable CartesianProductExecTransformer.                                         
                                                                                
                                                                                
                                                                                
                                                                                
                  [...]
 | spark.gluten.sql.broadcastNestedLoopJoinTransformerEnabled   | Config to 
enable BroadcastNestedLoopJoinExecTransformer.                                  
                                                                                
                                                                                
                                                                                
                                                                                
                  [...]
 | spark.gluten.sql.cacheWholeStageTransformerContext           | When true, 
`WholeStageTransformer` will cache the `WholeStageTransformerContext` when 
executing. It is used to get substrait plan node and native plan string.        
                                                                                
                                                                                
                                                                                
                      [...]
-| spark.gluten.sql.injectNativePlanStringToExplain             | When true, 
Gluten will inject native plan tree to explain string inside 
`WholeStageTransformerContext`.                                                 
                                                                                
                                                                                
                                                                                
                                    [...]
+| spark.gluten.sql.injectNativePlanStringToExplain             | When true, 
Gluten will inject native plan tree to Spark's explain output.                  
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 | spark.gluten.sql.fallbackRegexpExpressions                   | When true, 
Gluten will fall back all regexp expressions to avoid any incompatibility risk. 
                                                                                
                                                                                
                                                                                
                                                                                
                 [...]
 
 ## Velox Parameters
@@ -89,7 +89,7 @@ The following configurations are related to Velox settings.
 | spark.gluten.sql.columnar.backend.velox.filePreloadThreshold         | Set 
the file preload threshold for velox file scan.                                 
                                                               |                
   |
 | spark.gluten.sql.columnar.backend.velox.prefetchRowGroups            | Set 
the prefetch row groups for velox file scan.                                    
                                                               |                
   |
 | spark.gluten.sql.columnar.backend.velox.loadQuantum                  | Set 
the load quantum for velox file scan.                                           
                                                               |                
   |
-| spark.gluten.sql.columnar.backend.velox.maxCoalescedDistance         | Set 
the max coalesced distance for velox file scan.                                 
                                                         |                   |
+| spark.gluten.sql.columnar.backend.velox.maxCoalescedDistance         | Set 
the max coalesced distance for velox file scan.                                 
                                                               |                
   |
 | spark.gluten.sql.columnar.backend.velox.maxCoalescedBytes            | Set 
the max coalesced bytes for velox file scan.                                    
                                                               |                
   |
 | spark.gluten.sql.columnar.backend.velox.cachePrefetchMinPct          | Set 
prefetch cache min pct for velox file scan.                                     
                                                               |                
   |
 | spark.gluten.velox.awsSdkLogLevel                                    | Log 
granularity of AWS C++ SDK in velox.                                            
                                                               | FATAL          
   |
diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index 1989922a93..e649709278 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -502,8 +502,9 @@ Both Parquet and ORC datasets are sf1024.
 
 Please refer [Gluten UI](VeloxGlutenUI.md)
 
-# Gluten Implicits
+# Gluten Native Plan Summary
 
+## Gluten Implicits
 Gluten provides a helper class to get the fallback summary from a Spark 
Dataset.
 
 ```
@@ -516,6 +517,34 @@ Note that, if AQE is enabled, but the query is not 
materialized, then it will re
 the query execution with disabled AQE. It is a workaround to get the final 
plan, and it may
 cause the inconsistent results with a materialized query. However, we have no 
choice.
 
+## Native Plan in Spark's Explain Output
+
+Gluten supports inject native plan string into Spark explain with formatted 
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
+Here is an example, how Gluten shows the native plan string.
+
+```
+(9) WholeStageCodegenTransformer (2)
+Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
+Arguments: false
+Native Plan:
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
+  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
+    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
+```
+
+## Native Plan with Stats
+
+Gluten supports print native plan with stats to executor system output stream 
by setting `--conf spark.gluten.sql.debug=true`.
+Note that, the plan string with stats is task level which may cause executor 
log size big. Here is an example, how Gluten show the native plan string with 
stats.
+
+```
+I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native 
Plan with stats for: [Stage: 1 TID: 16]
+-- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
+   Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time: 
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
+      queuedWallNanos              sum: 2.00us, count: 1, min: 2.00us, max: 
2.00us
+```
+
 # Accelerators
 
 Please refer [HBM](VeloxHBM.md) [QAT](VeloxQAT.md) [IAA](VeloxIAA.md) for 
details
diff --git a/docs/get-started/VeloxGlutenUI.md 
b/docs/get-started/VeloxGlutenUI.md
index 6f40e25b9e..53453af352 100644
--- a/docs/get-started/VeloxGlutenUI.md
+++ b/docs/get-started/VeloxGlutenUI.md
@@ -36,81 +36,3 @@ If you want to disable Gluten UI, add a config when 
submitting `--conf spark.glu
 ## History server
 
 Gluten UI also supports Spark history server. Add gluten-ui jar into the 
history server classpath, e.g., $SPARK_HOME/jars, then restart history server.
-
-## Native plan string
-
-Gluten supports inject native plan string into Spark explain with formatted 
mode by setting `--conf spark.gluten.sql.injectNativePlanStringToExplain=true`.
-Here is an example, how Gluten show the native plan string.
-
-```
-(9) WholeStageCodegenTransformer (2)
-Input [6]: [c1#0L, c2#1L, c3#2L, c1#3L, c2#4L, c3#5L]
-Arguments: false
-Native Plan:
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
-  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
-    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-```
-
-## Native plan with stats
-
-Gluten supports print native plan with stats to executor system output stream 
by setting `--conf spark.gluten.sql.debug=true`.
-Note that, the plan string with stats is task level which may cause executor 
log size big. Here is an example, how Gluten show the native plan string with 
stats.
-
-```
-I20231121 10:19:42.348845 90094332 WholeStageResultIterator.cc:220] Native 
Plan with stats for: [Stage: 1 TID: 16]
--- Project[expressions: (n3_6:BIGINT, "n0_0"), (n3_7:BIGINT, "n0_1"), 
(n3_8:BIGINT, "n0_2"), (n3_9:BIGINT, "n1_0"), (n3_10:BIGINT, "n1_1"), 
(n3_11:BIGINT, "n1_2")] -> n3_6:BIGINT, n3_7:BIGINT, n3_8:BIGINT, n3_9:BIGINT, 
n3_10:BIGINT, n3_11:BIGINT
-   Output: 27 rows (3.56KB, 3 batches), Cpu time: 10.58us, Blocked wall time: 
0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
-      queuedWallNanos              sum: 2.00us, count: 1, min: 2.00us, max: 
2.00us
-      runningAddInputWallNanos     sum: 626ns, count: 1, min: 626ns, max: 626ns
-      runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
-      runningGetOutputWallNanos    sum: 5.54us, count: 1, min: 5.54us, max: 
5.54us
-  -- HashJoin[INNER n1_1=n0_1] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT, 
n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-     Output: 27 rows (3.56KB, 3 batches), Cpu time: 223.00us, Blocked wall 
time: 0ns, Peak memory: 93.12KB, Memory allocations: 15
-     HashBuild: Input: 10 rows (960B, 10 batches), Output: 0 rows (0B, 0 
batches), Cpu time: 185.67us, Blocked wall time: 0ns, Peak memory: 68.00KB, 
Memory allocations: 2, Threads: 1
-        distinctKey0                 sum: 4, count: 1, min: 4, max: 4
-        hashtable.capacity           sum: 4, count: 1, min: 4, max: 4
-        hashtable.numDistinct        sum: 10, count: 1, min: 10, max: 10
-        hashtable.numRehashes        sum: 1, count: 1, min: 1, max: 1
-        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
-        rangeKey0                    sum: 4, count: 1, min: 4, max: 4
-        runningAddInputWallNanos     sum: 1.27ms, count: 1, min: 1.27ms, max: 
1.27ms
-        runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
-        runningGetOutputWallNanos    sum: 1.29us, count: 1, min: 1.29us, max: 
1.29us
-     H23/11/21 10:19:42 INFO TaskSetManager: Finished task 3.0 in stage 1.0 
(TID 13) in 335 ms on 10.221.97.35 (executor driver) (1/10)
-ashProbe: Input: 9 rows (864B, 3 batches), Output: 27 rows (3.56KB, 3 
batches), Cpu time: 37.33us, Blocked wall time: 0ns, Peak memory: 25.12KB, 
Memory allocations: 13, Threads: 1
-        dynamicFiltersProduced       sum: 1, count: 1, min: 1, max: 1
-        queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
-        runningAddInputWallNanos     sum: 4.54us, count: 1, min: 4.54us, max: 
4.54us
-        runningFinishWallNanos       sum: 83ns, count: 1, min: 83ns, max: 83ns
-        runningGetOutputWallNanos    sum: 29.08us, count: 1, min: 29.08us, 
max: 29.08us
-    -- TableScan[table: hive_table, range filters: [(c2, Filter(IsNotNull, 
deterministic, null not allowed))]] -> n1_0:BIGINT, n1_1:BIGINT, n1_2:BIGINT
-       Input: 9 rows (864B, 3 batches), Output: 9 rows (864B, 3 batches), Cpu 
time: 630.75us, Blocked wall time: 0ns, Peak memory: 2.44KB, Memory 
allocations: 63, Threads: 1, Splits: 3
-          dataSourceWallNanos              sum: 102.00us, count: 1, min: 
102.00us, max: 102.00us
-          dynamicFiltersAccepted           sum: 1, count: 1, min: 1, max: 1
-          flattenStringDictionaryValues    sum: 0, count: 1, min: 0, max: 0
-          ioWaitNanos                      sum: 312.00us, count: 1, min: 
312.00us, max: 312.00us
-          localReadBytes                   sum: 0B, count: 1, min: 0B, max: 0B
-          numLocalRead                     sum: 0, count: 1, min: 0, max: 0
-          numPrefetch                      sum: 0, count: 1, min: 0, max: 0
-          numRamRead                       sum: 0, count: 1, min: 0, max: 0
-          numStorageRead                   sum: 6, count: 1, min: 6, max: 6
-          overreadBytes                    sum: 0B, count: 1, min: 0B, max: 0B
-          prefetchBytes                    sum: 0B, count: 1, min: 0B, max: 0B
-          queryThreadIoLatency             sum: 12, count: 1, min: 12, max: 12
-          ramReadBytes                     sum: 0B, count: 1, min: 0B, max: 0B
-          runningAddInputWallNanos         sum: 0ns, count: 1, min: 0ns, max: 
0ns
-          runningFinishWallNanos           sum: 125ns, count: 1, min: 125ns, 
max: 125ns
-          runningGetOutputWallNanos        sum: 1.07ms, count: 1, min: 1.07ms, 
max: 1.07ms
-          skippedSplitBytes                sum: 0B, count: 1, min: 0B, max: 0B
-          skippedSplits                    sum: 0, count: 1, min: 0, max: 0
-          skippedStrides                   sum: 0, count: 1, min: 0, max: 0
-          storageReadBytes                 sum: 3.44KB, count: 1, min: 3.44KB, 
max: 3.44KB
-          totalScanTime                    sum: 0ns, count: 1, min: 0ns, max: 
0ns
-    -- ValueStream[] -> n0_0:BIGINT, n0_1:BIGINT, n0_2:BIGINT
-       Input: 0 rows (0B, 0 batches), Output: 10 rows (960B, 10 batches), Cpu 
time: 1.03ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, 
Threads: 1
-          runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
-          runningFinishWallNanos       sum: 54.62us, count: 1, min: 54.62us, 
max: 54.62us
-          runningGetOutputWallNanos    sum: 1.10ms, count: 1, min: 1.10ms, 
max: 1.10ms
-```
diff --git 
a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala 
b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
index 3b86959631..31d397d7ba 100644
--- a/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
+++ b/shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
@@ -1563,8 +1563,7 @@ object GlutenConfig {
   val INJECT_NATIVE_PLAN_STRING_TO_EXPLAIN =
     buildConf("spark.gluten.sql.injectNativePlanStringToExplain")
       .internal()
-      .doc("When true, Gluten will inject native plan tree to explain string 
inside " +
-        "`WholeStageTransformerContext`.")
+      .doc("When true, Gluten will inject native plan tree to Spark's explain 
output.")
       .booleanConf
       .createWithDefault(false)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: [VL][DOC] Move irrelevant content from VeloxGlutenUI.md (#8786)

Reply via email to