[
https://issues.apache.org/jira/browse/DRILL-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518450#comment-16518450
]
ASF GitHub Bot commented on DRILL-6147:
---------------------------------------
vrozov commented on a change in pull request #1330: DRILL-6147: Adding Columnar
Parquet Batch Sizing functionality
URL: https://github.com/apache/drill/pull/1330#discussion_r196894542
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
##########
@@ -315,6 +315,13 @@ private ExecConstants() {
public static final String PARQUET_FLAT_READER_BULK =
"store.parquet.flat.reader.bulk";
public static final OptionValidator PARQUET_FLAT_READER_BULK_VALIDATOR = new
BooleanValidator(PARQUET_FLAT_READER_BULK);
+ // Controls the flat parquet reader batching constraints (number of record
and memory limit)
+ public static final String PARQUET_FLAT_BATCH_NUM_RECORDS =
"store.parquet.flat.batch.num_records";
+ public static final OptionValidator PARQUET_FLAT_BATCH_NUM_RECORDS_VALIDATOR
= new RangeLongValidator(PARQUET_FLAT_BATCH_NUM_RECORDS, 1, Integer.MAX_VALUE);
+ public static final String PARQUET_FLAT_BATCH_MEMORY_SZ =
"store.parquet.flat.batch.memory_sz";
Review comment:
Please be consistent with other drill settings, do not abbreviate size to sz.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Limit batch size for Flat Parquet Reader
> ----------------------------------------
>
> Key: DRILL-6147
> URL: https://issues.apache.org/jira/browse/DRILL-6147
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> The Parquet reader currently uses a hard-coded batch size limit (32k rows)
> when creating scan batches; there is no parameter nor any logic for
> controlling the amount of memory used. This enhancement will allow Drill to
> take an extra input parameter to control direct memory usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)