Github user jinfengni commented on a diff in the pull request:
https://github.com/apache/drill/pull/597#discussion_r80970307
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -115,6 +115,8 @@
private List<RowGroupInfo> rowGroupInfos;
private Metadata.ParquetTableMetadataBase parquetTableMetadata = null;
private String cacheFileRoot = null;
+ private int batchSize;
+ private static final int DEFAULT_BATCH_LENGTH = 256 * 1024;
--- End diff --
Now I guess I understand what you mean. You want to cap
store.parquet.record_batch_size to 256K. And set DEFAULT_BATCH_LENGTH =256k in
ParquetGroupScan. At execution time, you pick min of batch_size and value from
option, which will be no greater than the option value.
If that's correct, can we remove DEFAULT_BATCH_LENGTH in ParquetGroupScan.
In stead, use the batch_size specified in the new option you added for
NON-LIMIT case?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---