This is an automated email from the ASF dual-hosted git repository.

JingsongLi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git


The following commit(s) were added to refs/heads/master by this push:
     new 3ff34a34df [format] Pass through parquet statistics and 
page-size-check config in RowDataParquetBuilder (#7956)
3ff34a34df is described below

commit 3ff34a34dfc244aaee7bce5a4fa76658a497d96b
Author: yugan <[email protected]>
AuthorDate: Mon May 25 19:05:53 2026 +0800

    [format] Pass through parquet statistics and page-size-check config in 
RowDataParquetBuilder (#7956)
    
    Pass through parquet statistics and page-size-check configuration in
    `RowDataParquetBuilder`.
    
    Currently `RowDataParquetBuilder` does not forward the following Parquet
    config keys to the writer:
      - `parquet.statistics.truncate.length`
      - `parquet.columnindex.truncate.length`
      - `parquet.page.size.row.check.min`
      - `parquet.page.size.row.check.max`
    
    Without these, users cannot tune Parquet page-size checking behavior or
    control the truncation length of statistics and column indexes. This is
    especially relevant for tables with large records, where the default
    page-size check thresholds can lead to oversized pages.
    
    Split out from #7621 per reviewer feedback, as this is an independent
    enhancement.
---
 .../format/parquet/writer/RowDataParquetBuilder.java   | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git 
a/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
 
b/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
index 1a66a55130..7d72c8a1a9 100644
--- 
a/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
+++ 
b/paimon-format/src/main/java/org/apache/paimon/format/parquet/writer/RowDataParquetBuilder.java
@@ -96,7 +96,23 @@ public class RowDataParquetBuilder implements 
ParquetBuilder<InternalRow> {
                         .withBloomFilterEnabled(
                                 conf.getBoolean(
                                         
ParquetOutputFormat.BLOOM_FILTER_ENABLED,
-                                        
ParquetProperties.DEFAULT_BLOOM_FILTER_ENABLED));
+                                        
ParquetProperties.DEFAULT_BLOOM_FILTER_ENABLED))
+                        .withMinRowCountForPageSizeCheck(
+                                conf.getInt(
+                                        
ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK,
+                                        
ParquetProperties.DEFAULT_MINIMUM_RECORD_COUNT_FOR_CHECK))
+                        .withMaxRowCountForPageSizeCheck(
+                                conf.getInt(
+                                        
ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK,
+                                        
ParquetProperties.DEFAULT_MAXIMUM_RECORD_COUNT_FOR_CHECK))
+                        .withStatisticsTruncateLength(
+                                conf.getInt(
+                                        
ParquetOutputFormat.STATISTICS_TRUNCATE_LENGTH,
+                                        
ParquetProperties.DEFAULT_STATISTICS_TRUNCATE_LENGTH))
+                        .withColumnIndexTruncateLength(
+                                conf.getInt(
+                                        
ParquetOutputFormat.COLUMN_INDEX_TRUNCATE_LENGTH,
+                                        
ParquetProperties.DEFAULT_COLUMN_INDEX_TRUNCATE_LENGTH));
         new ColumnConfigParser()
                 .withColumnConfig(
                         ParquetOutputFormat.ENABLE_DICTIONARY,

Reply via email to