[ 
https://issues.apache.org/jira/browse/PARQUET-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192753#comment-17192753
 ] 

ASF GitHub Bot commented on PARQUET-869:
----------------------------------------

panthony edited a comment on pull request #470:
URL: https://github.com/apache/parquet-mr/pull/470#issuecomment-682457362


   Same here, we have 1 or 2 columns that can vary widely in size (few Kbs up 
to 10Mb) and we often stumble upon an OutOfMemory error because it didn't check 
the buffered rows in time.
   
   Being able to adjust the checks frequency would be a huge help 👍 
   
   I have a [rebased 
branch](https://github.com/cogniteev/parquet-mr/tree/PARQUET-869-configurable-row-group-min-max-record-check)
 against master if anyone interested


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Min/Max record counts for block size checks are not configurable
> ----------------------------------------------------------------
>
>                 Key: PARQUET-869
>                 URL: https://issues.apache.org/jira/browse/PARQUET-869
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Pradeep Gollakota
>            Priority: Major
>
> While the min/max record counts for page size check are configurable via 
> ParquetOutputFormat.MIN_ROW_COUNT_FOR_PAGE_SIZE_CHECK and 
> ParquetOutputFormat.MAX_ROW_COUNT_FOR_PAGE_SIZE_CHECK configs and via 
> ParquetProperties directly, the min/max record counts for block size check 
> are hard coded inside InternalParquetRecordWriter.
> These two settings should also be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to