Drill should try to prune directories 'as early as possible'; ideally at the time of reading from the filesystem the first time during planning phase. Could we take advantage of o.a.hadoop.fs.PathFilter to only read directories that match a pattern ? Currently, Drill uses PathFilter to skip certain types of files such as dot files. For directory filters we could create the path patterns and pass it to the fs.listStatus() method. Conceptually, it seems this would work but there are implementation details - especially around the fact that filters need to be evaluated first through the interpreter based evaluation. Perhaps we could do this for the simple directory filters only (e.g dir0 = 2014) which don't involve functions and expressions.
Any thoughts ? Note that we have talked about other optimizations to partition pruning (there are JIRAs open) but I don't recall anything about specifically limiting the initial data read from the filesystem. Aman
