[ 
https://issues.apache.org/jira/browse/PARQUET-409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515482#comment-16515482
 ] 

ASF GitHub Bot commented on PARQUET-409:
----------------------------------------

gszadovszky commented on a change in pull request #495: PARQUET-409: Add a 
configuration key that controls min/max row count for block size check
URL: https://github.com/apache/parquet-mr/pull/495#discussion_r195994804
 
 

 ##########
 File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordWriter.java
 ##########
 @@ -147,12 +147,12 @@ private void checkBlockSizeReached() throws IOException {
         LOG.info("mem size {} > {}: flushing {} records to disk.", memSize, 
nextRowGroupSize, recordCount);
         flushRowGroupToStore();
         initStore();
-        recordCountForNextMemCheck = min(max(MINIMUM_RECORD_COUNT_FOR_CHECK, 
recordCount / 2), MAXIMUM_RECORD_COUNT_FOR_CHECK);
+        recordCountForNextMemCheck = 
min(max(props.getMinRowCountForBlockSizeCheck(), recordCount / 2), 
props.getMaxRowCountForBlockSizeCheck());
 
 Review comment:
   Now, it seems that the local constants MINIMUM_RECORD_COUNT_FOR_CHECK and 
MAXIMUM_RECORD_COUNT_FOR_CHECK are not needed anymore. Could you please remove 
them? (recordCountForNextMemCheck should be initialized by using the 
corresponding value from ParquetProperties.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> InternalParquetRecordWriter doesn't use min/max row counts
> ----------------------------------------------------------
>
>                 Key: PARQUET-409
>                 URL: https://issues.apache.org/jira/browse/PARQUET-409
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.8.1
>            Reporter: Ryan Blue
>            Priority: Major
>             Fix For: 1.9.0
>
>
> PARQUET-99 added settings to control the min and max number of rows between 
> size checks when flushing pages, and a setting to control whether to always 
> use a static size (the min). The [InternalParquetRecordWriter has similar 
> checks|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordWriter.java#L143]
>  that don't use those settings. We should determine if it should update it to 
> use those settings or similar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to