[GitHub] drill pull request #597: DRILL-4905: Push down the LIMIT to the parquet read...

jinfengni Wed, 28 Sep 2016 10:18:21 -0700

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/597#discussion_r80970307
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
    @@ -115,6 +115,8 @@
       private List<RowGroupInfo> rowGroupInfos;
       private Metadata.ParquetTableMetadataBase parquetTableMetadata = null;
       private String cacheFileRoot = null;
    +  private int batchSize;
    +  private static final int DEFAULT_BATCH_LENGTH = 256 * 1024;
    --- End diff --
    
    Now I guess I understand what you mean.  You want to cap 
store.parquet.record_batch_size to 256K. And set DEFAULT_BATCH_LENGTH =256k in 
ParquetGroupScan. At execution time, you pick min of batch_size and value from 
option, which will be no greater than the option value. 
    
    If that's correct, can we remove DEFAULT_BATCH_LENGTH in ParquetGroupScan. 
In stead, use the batch_size specified in the new option you added for 
NON-LIMIT case?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #597: DRILL-4905: Push down the LIMIT to the parquet read...

Reply via email to