[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

jacques-n Wed, 10 Feb 2016 19:13:44 -0800

Github user jacques-n commented on a diff in the pull request:

    https://github.com/apache/drill/pull/371#discussion_r52560352
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
    @@ -791,6 +799,43 @@ public FileGroupScan clone(FileSelection selection) 
throws IOException {
       }
     
       @Override
    +  public GroupScan applyLimit(long maxRecords) {
    --- End diff --
    
    I was thinking about that as well. Theoretically, it would be best to do a 
sort on record count and then binary search to the row group that has the 
closest number greater than the requested amount (too small means multiple 
files, larger files require more metadata reading/parsing. However, it kind of 
seems like premature optimization to me. Are you seeing lots of people with 
many small Parquet files? That generally seems counter to the Parquet design.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4363: Row count based pruning for parque...

Reply via email to