Hao Zhu created DRILL-3735:
------------------------------

             Summary: Directory pruning is not happening when number of files 
is larger than 64k
                 Key: DRILL-3735
                 URL: https://issues.apache.org/jira/browse/DRILL-3735
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.1.0
            Reporter: Hao Zhu
            Assignee: Jinfeng Ni


When the number of files is larger than 64k limit, directory pruning is not 
happening. 
We need to increase this limit further to handle most use cases.

My proposal is to separate the code for directory pruning and partition 
pruning. 
Say in a parent directory there are 100 directories and 1 million files.
If we only query the file from one directory, we should firstly read the 100 
directories and narrow down to which directory; and then read the file paths in 
that directory in memory and do the rest stuff.

Current behavior is , Drill will read all the file paths of that 1 million 
files in memory firstly, and then do directory pruning or partition pruning. 
This is not performance efficient nor memory efficient. And also it can not 
scale.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to