(incubator-gluten) branch main updated: [VL] Fix parquet write document (#11536)

chengchengjin Wed, 04 Feb 2026 19:52:21 -0800

This is an automated email from the ASF dual-hosted git repository.

chengchengjin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git



The following commit(s) were added to refs/heads/main by this push:
     new f5f84ff848 [VL] Fix parquet write document (#11536)
f5f84ff848 is described below

commit f5f84ff848d247c23eb39e76a6550cbb753797ed
Author: BInwei Yang <[email protected]>
AuthorDate: Wed Feb 4 19:50:49 2026 -0800

    [VL] Fix parquet write document (#11536)
---
 docs/velox-backend-limitations.md         | 16 +++-------------
 docs/velox-parquet-write-configuration.md | 31 +++++++++++++++++++++----------
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/docs/velox-backend-limitations.md 
b/docs/velox-backend-limitations.md
index a2b76456d5..6d7ef44575 100644
--- a/docs/velox-backend-limitations.md
+++ b/docs/velox-backend-limitations.md
@@ -123,25 +123,15 @@ spark.range(100).toDF("id")
 Gluten supports writes of HiveFileFormat when the output file type is of type 
`parquet` only
 
 #### NaN support
+
 Velox does NOT support NaN. So unexpected result can be obtained for a few 
cases, e.g., comparing a number with NaN.
 
 #### Configuration
 
-Parquet write only support three configs, other will not take effect.
-
-- compression code:
-  - sql conf: `spark.sql.parquet.compression.codec`
-  - option: `compression.codec`
-- block size
-  - sql conf: `spark.gluten.sql.columnar.parquet.write.blockSize`
-  - option: `parquet.block.size`
-- block rows
-  - sql conf: `spark.gluten.sql.native.parquet.write.blockRows`
-  - option: `parquet.block.rows`
-
-
+Not all parquet configurations are honored by Gluten. Check 
docs/velox-parquet-write-configuration.md for details.
 
 ### Fetal error caused by Spark's columnar reading
+
 If the user enables Spark's columnar reading, error can occur due to Spark's 
columnar vector is not compatible with
 Gluten's.
 
diff --git a/docs/velox-parquet-write-configuration.md 
b/docs/velox-parquet-write-configuration.md
index cb82c5adc7..69a5d21c19 100644
--- a/docs/velox-parquet-write-configuration.md
+++ b/docs/velox-parquet-write-configuration.md
@@ -3,6 +3,17 @@ title: Parquet write configuration
 nav_order: 17
 
 ## Parquet write configurations in Spark/Velox/Gluten
+
+Gluten configuration includes two types. config in parquet and config in 
spark. the two configurations below has the same effect. One is for spark 
session, the other is for the query.
+
+```
+sc.conf.set("spark.gluten.sql.native.parquet.write.blockRows")
+
+df.write.option("parquet.block.rows").save()
+
+```
+
+
 <table class="spark-config">
 <thead>
        <tr>
@@ -10,7 +21,7 @@ nav_order: 17
                <th>parquet-mr default</th>
                <th>Spark default</th>
                <th>Velox Default</th>
-               <th>Gluten Support</th>
+               <th>Gluten Config</th>
        </tr>
 </thead>
 <tbody>
@@ -31,11 +42,11 @@ nav_order: 17
        </tr>
        <tr>
                <td><code>write_batch_size</code></td>
-               <td></td><td></td><td>1024</td><td>Y (batch size)</td>
+               
<td></td><td></td><td>1024</td><td>spark.gluten.sql.columnar.maxBatchSize</td>
        </tr>
        <tr>
                <td><code>rowgroup_length</code></td>
-               <td></td><td></td><td>1M</td><td></td>
+               
<td></td><td></td><td>1M</td><td>parquet.block.rows<br>spark.gluten.sql.native.parquet.write.blockRows</td>
        </tr>
        <tr>
                <td><code>compression_level</code></td>
@@ -66,23 +77,23 @@ nav_order: 17
        </tr>
        <tr>
                <td><code>parquet.block.size</code></td>
-               <td>128m</td><td></td><td></td><td>Y</td>
+               
<td>128m</td><td></td><td></td><td>parquet.block.size<br>spark.gluten.sql.columnar.parquet.write.blockSize</td>
        </tr>
        <tr>
                <td><code>parquet.page.size</code></td>
-               <td>1m</td><td></td><td>1M</td><td>Y</td>
+               <td>1m</td><td></td><td>1M</td><td>parquet.page.size</td>
        </tr>
        <tr>
                <td><code>parquet.compression</code></td>
-               
<td>uncompressed</td><td>snappy</td><td>uncompressed</td><td>Y</td>
+               
<td>uncompressed</td><td>snappy</td><td>uncompressed</td><td>parquet.compression<br>spark.sql.parquet.compression.codec</td>
        </tr>
        <tr>
                <td><code>parquet.write.support.class</code></td>
-               
<td>org.apache.parquet.hadoop.api.WriteSupport</td><td></td><td></td><td></td>
+               
<td>org.apache.parquet<br>.hadoop.api.WriteSupport</td><td></td><td></td><td></td>
        </tr>
        <tr>
                <td><code>parquet.enable.dictionary</code></td>
-               <td>true</td><td></td><td>true</td><td>Y</td>
+               
<td>true</td><td></td><td>true</td><td>parquet.enable.dictionary</td>
        </tr>
        <tr>
                <td><code>parquet.dictionary.page.size</code></td>
@@ -94,7 +105,7 @@ nav_order: 17
        </tr>
        <tr>
                <td><code>parquet.writer.version</code></td>
-               <td>PARQUET_1_0</td><td></td><td>PARQUET_2_6</td><td>Y</td>
+               
<td>PARQUET_1_0</td><td></td><td>PARQUET_2_6</td><td>parquet.writer.version</td>
        </tr>
        <tr>
                <td><code>parquet.memory.pool.ratio</code></td>
@@ -178,7 +189,7 @@ nav_order: 17
        </tr>
        <tr>
                <td><code>parquet.compression.codec.zstd.level</code></td>
-               <td>3</td><td></td><td>0</td><td>Y</td>
+               
<td>3</td><td></td><td>0</td><td>parquet.compression.codec.zstd.level</td>
        </tr>
        <tr>
                <td><code>parquet.compression.codec.zstd.workers</code></td>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-gluten) branch main updated: [VL] Fix parquet write document (#11536)

Reply via email to