Using PathFilter for directory pruning

Aman Sinha Mon, 05 Oct 2015 09:26:01 -0700

Drill should try to prune directories 'as early as possible'; ideally at
the time of reading from the filesystem the first time during planning
phase.  Could we take advantage of o.a.hadoop.fs.PathFilter to only read
directories that match a pattern ?  Currently, Drill uses PathFilter to
skip certain types of files such as dot files.  For directory filters we
could create the path patterns and pass it to the  fs.listStatus() method.
  Conceptually, it seems this would work but there are implementation
details - especially around the fact that filters need to be evaluated
first through the interpreter based evaluation.  Perhaps we could do this
for the simple directory filters only (e.g dir0 = 2014) which don't involve
functions and expressions.


Any thoughts ?  Note that we have talked about other optimizations to
partition pruning (there are JIRAs open) but I don't recall anything about
specifically limiting the initial data read from the filesystem.

Aman

Using PathFilter for directory pruning

Reply via email to