This is an automated email from the ASF dual-hosted git repository.
JingsongLi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 3ff34a34df [format] Pass through parquet statistics and
page-size-check config in RowDataParquetBuilder (#7956)
3ff34a34df is described below
commit 3ff34a34dfc244aaee7bce5a4fa76658a497d96b
Author: yugan <[email protected]>
AuthorDate: Mon May 25 19:05:53 2026 +0800
[format] Pass through parquet statistics and page-size-check config in
RowDataParquetBuilder (#7956)
Pass through parquet statistics and page-size-check configuration in
`RowDataParquetBuilder`.
Currently `RowDataParquetBuilder` does not forward the following Parquet
config keys to the writer:
- `parquet.statistics.truncate.length`
- `parquet.columnindex.truncate.length`
- `parquet.page.size.row.check.min`
- `parquet.page.size.row.check.max`
Without these, users cannot tune Parquet page-size checking behavior or
control the truncation length of statistics and column indexes. This is
especially relevant for tables with large records, where the default
page-size check thresholds can lead to oversized pages.
Split out from #7621 per reviewer feedback, as this is an independent
enhancement.
---
.../format/parquet/writer/RowDataParquetBuilder.java | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git
a/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
b/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
index 1a66a55130..7d72c8a1a9 100644
---
a/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
+++
b/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
@@ -96,7 +96,23 @@ public class RowDataParquetBuilder implements
ParquetBuilder<InternalRow> {
.withBloomFilterEnabled(
conf.getBoolean(
ParquetOutputFormat.BLOOM_FILTER_ENABLED,
-
ParquetProperties.DEFAULT_BLOOM_FILTER_ENABLED));
+
ParquetProperties.DEFAULT_BLOOM_FILTER_ENABLED))
+ .withMinRowCountForPageSizeCheck(
+ conf.getInt(
+
ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK,
+
ParquetProperties.DEFAULT_MINIMUM_RECORD_COUNT_FOR_CHECK))
+ .withMaxRowCountForPageSizeCheck(
+ conf.getInt(
+
ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK,
+
ParquetProperties.DEFAULT_MAXIMUM_RECORD_COUNT_FOR_CHECK))
+ .withStatisticsTruncateLength(
+ conf.getInt(
+
ParquetOutputFormat.STATISTICS_TRUNCATE_LENGTH,
+
ParquetProperties.DEFAULT_STATISTICS_TRUNCATE_LENGTH))
+ .withColumnIndexTruncateLength(
+ conf.getInt(
+
ParquetOutputFormat.COLUMN_INDEX_TRUNCATE_LENGTH,
+
ParquetProperties.DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH));
new ColumnConfigParser()
.withColumnConfig(
ParquetOutputFormat.ENABLE_DICTIONARY,