yugan95 opened a new pull request, #7956:
URL: https://github.com/apache/paimon/pull/7956

     ### Purpose
   
     Pass through parquet statistics and page-size-check configuration in
     `RowDataParquetBuilder`.
   
     Currently `RowDataParquetBuilder` does not forward the following Parquet 
config keys to
     the writer:
     - `parquet.statistics.truncate.length`
     - `parquet.columnindex.truncate.length`
     - `parquet.page.size.row.check.min`
     - `parquet.page.size.row.check.max`
   
     Without these, users cannot tune Parquet page-size checking behavior or 
control the
     truncation length of statistics and column indexes. This is especially 
relevant for
     tables with large records, where the default page-size check thresholds 
can lead to
     oversized pages.
   
     Split out from #7621 per reviewer feedback, as this is an independent 
enhancement.
   
     #### Changes
   
     - **`RowDataParquetBuilder`**: add `.withMinRowCountForPageSizeCheck()`,
     `.withMaxRowCountForPageSizeCheck()`, `.withStatisticsTruncateLength()`,
     `.withColumnIndexTruncateLength()` to the builder chain, reading from the 
existing
     Hadoop `Configuration` with Parquet's default values as fallback.
   
     ### Tests
   
     Existing Parquet write tests cover the default config path. The new keys 
follow the same
      pattern as other builder options (e.g. `withDictionaryPageSize`, 
`withPageSize`) and
     use Parquet's built-in defaults when not set.
   
     ### API and Format
   
     N/A — no public API or format changes. Users can now set these keys via 
table
     properties, using the same mechanism as other Parquet config keys.
   
     ### Documentation
   
     N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to