sqd opened a new pull request, #16317: URL: https://github.com/apache/iceberg/pull/16317
parquet-java/parquet-mr supports this limit (config key parquet.block.row.count.limit) but Iceberg doesn't support setting it yet. This commit simply wires it up. Parquet.WriteBuilder has two terminal paths: when callers supply a createWriterFunc (Iceberg's ParquetValueWriter — the path every production engine integration uses, such as Spark rewrite_data_files), the build returns Iceberg's own ParquetWriter, which manages the row-group lifecycle itself and ignores parquet-mr's auto-roll. When callers supply a WriteSupport instead, we delegate to parquet-mr's ParquetWriter, which enforces row-group limits internally. The new property has to be wired into both: on the Iceberg path it is carried via ParquetProperties and consumed by an explicit recordCount check in ParquetWriter.checkSize(); on the parquet-mr path it is passed through ParquetWriteBuilder.withRowGroupRowCountLimit() and enforced by parquet-mr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
