[
https://issues.apache.org/jira/browse/PARQUET-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459331#comment-16459331
]
ASF GitHub Bot commented on PARQUET-869:
----------------------------------------
rdblue opened a new pull request #470: PARQUET-869: Configurable record counts
for block size checks
URL: https://github.com/apache/parquet-mr/pull/470
This PR adds on #447 and updates the properties to use "row group" instead
of "block" because block is confusing. It also fixes the outstanding review
comments so this can be merged.
Closes #447.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Min/Max record counts for block size checks are not configurable
> ----------------------------------------------------------------
>
> Key: PARQUET-869
> URL: https://issues.apache.org/jira/browse/PARQUET-869
> Project: Parquet
> Issue Type: Improvement
> Reporter: Pradeep Gollakota
> Priority: Major
>
> While the min/max record counts for page size check are configurable via
> ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK and
> ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK configs and via
> ParquetProperties directly, the min/max record counts for block size check
> are hard coded inside InternalParquetRecordWriter.
> These two settings should also be configurable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)