This is an automated email from the ASF dual-hosted git repository.
chengchengjin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new f5f84ff848 [VL] Fix parquet write document (#11536)
f5f84ff848 is described below
commit f5f84ff848d247c23eb39e76a6550cbb753797ed
Author: BInwei Yang <[email protected]>
AuthorDate: Wed Feb 4 19:50:49 2026 -0800
[VL] Fix parquet write document (#11536)
---
docs/velox-backend-limitations.md | 16 +++-------------
docs/velox-parquet-write-configuration.md | 31 +++++++++++++++++++++----------
2 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/docs/velox-backend-limitations.md
b/docs/velox-backend-limitations.md
index a2b76456d5..6d7ef44575 100644
--- a/docs/velox-backend-limitations.md
+++ b/docs/velox-backend-limitations.md
@@ -123,25 +123,15 @@ spark.range(100).toDF("id")
Gluten supports writes of HiveFileFormat when the output file type is of type
`parquet` only
#### NaN support
+
Velox does NOT support NaN. So unexpected result can be obtained for a few
cases, e.g., comparing a number with NaN.
#### Configuration
-Parquet write only support three configs, other will not take effect.
-
-- compression code:
- - sql conf: `spark.sql.parquet.compression.codec`
- - option: `compression.codec`
-- block size
- - sql conf: `spark.gluten.sql.columnar.parquet.write.blockSize`
- - option: `parquet.block.size`
-- block rows
- - sql conf: `spark.gluten.sql.native.parquet.write.blockRows`
- - option: `parquet.block.rows`
-
-
+Not all parquet configurations are honored by Gluten. Check
docs/velox-parquet-write-configuration.md for details.
### Fetal error caused by Spark's columnar reading
+
If the user enables Spark's columnar reading, error can occur due to Spark's
columnar vector is not compatible with
Gluten's.
diff --git a/docs/velox-parquet-write-configuration.md
b/docs/velox-parquet-write-configuration.md
index cb82c5adc7..69a5d21c19 100644
--- a/docs/velox-parquet-write-configuration.md
+++ b/docs/velox-parquet-write-configuration.md
@@ -3,6 +3,17 @@ title: Parquet write configuration
nav_order: 17
## Parquet write configurations in Spark/Velox/Gluten
+
+Gluten configuration includes two types. config in parquet and config in
spark. the two configurations below has the same effect. One is for spark
session, the other is for the query.
+
+```
+sc.conf.set("spark.gluten.sql.native.parquet.write.blockRows")
+
+df.write.option("parquet.block.rows").save()
+
+```
+
+
<table class="spark-config">
<thead>
<tr>
@@ -10,7 +21,7 @@ nav_order: 17
<th>parquet-mr default</th>
<th>Spark default</th>
<th>Velox Default</th>
- <th>Gluten Support</th>
+ <th>Gluten Config</th>
</tr>
</thead>
<tbody>
@@ -31,11 +42,11 @@ nav_order: 17
</tr>
<tr>
<td><code>write_batch_size</code></td>
- <td></td><td></td><td>1024</td><td>Y (batch size)</td>
+
<td></td><td></td><td>1024</td><td>spark.gluten.sql.columnar.maxBatchSize</td>
</tr>
<tr>
<td><code>rowgroup_length</code></td>
- <td></td><td></td><td>1M</td><td></td>
+
<td></td><td></td><td>1M</td><td>parquet.block.rows<br>spark.gluten.sql.native.parquet.write.blockRows</td>
</tr>
<tr>
<td><code>compression_level</code></td>
@@ -66,23 +77,23 @@ nav_order: 17
</tr>
<tr>
<td><code>parquet.block.size</code></td>
- <td>128m</td><td></td><td></td><td>Y</td>
+
<td>128m</td><td></td><td></td><td>parquet.block.size<br>spark.gluten.sql.columnar.parquet.write.blockSize</td>
</tr>
<tr>
<td><code>parquet.page.size</code></td>
- <td>1m</td><td></td><td>1M</td><td>Y</td>
+ <td>1m</td><td></td><td>1M</td><td>parquet.page.size</td>
</tr>
<tr>
<td><code>parquet.compression</code></td>
-
<td>uncompressed</td><td>snappy</td><td>uncompressed</td><td>Y</td>
+
<td>uncompressed</td><td>snappy</td><td>uncompressed</td><td>parquet.compression<br>spark.sql.parquet.compression.codec</td>
</tr>
<tr>
<td><code>parquet.write.support.class</code></td>
-
<td>org.apache.parquet.hadoop.api.WriteSupport</td><td></td><td></td><td></td>
+
<td>org.apache.parquet<br>.hadoop.api.WriteSupport</td><td></td><td></td><td></td>
</tr>
<tr>
<td><code>parquet.enable.dictionary</code></td>
- <td>true</td><td></td><td>true</td><td>Y</td>
+
<td>true</td><td></td><td>true</td><td>parquet.enable.dictionary</td>
</tr>
<tr>
<td><code>parquet.dictionary.page.size</code></td>
@@ -94,7 +105,7 @@ nav_order: 17
</tr>
<tr>
<td><code>parquet.writer.version</code></td>
- <td>PARQUET_1_0</td><td></td><td>PARQUET_2_6</td><td>Y</td>
+
<td>PARQUET_1_0</td><td></td><td>PARQUET_2_6</td><td>parquet.writer.version</td>
</tr>
<tr>
<td><code>parquet.memory.pool.ratio</code></td>
@@ -178,7 +189,7 @@ nav_order: 17
</tr>
<tr>
<td><code>parquet.compression.codec.zstd.level</code></td>
- <td>3</td><td></td><td>0</td><td>Y</td>
+
<td>3</td><td></td><td>0</td><td>parquet.compression.codec.zstd.level</td>
</tr>
<tr>
<td><code>parquet.compression.codec.zstd.workers</code></td>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]