I think usually people add these directories as multiple partitions of the same table instead of union. This actually allows us to efficiently prune directories when reading in addition to standard column pruning.
On Tue, Sep 9, 2014 at 11:26 AM, Gary Malouf <malouf.g...@gmail.com> wrote: > I'm kind of surprised this was not run into before. Do people not > segregate their data by day/week in the HDFS directory structure? > > > On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Thanks! >> >> On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >> > Opened >> > >> > https://issues.apache.org/jira/browse/SPARK-3462 >> > >> > I'll take a look at ColumnPruning and see what I can do >> > >> > On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust < >> mich...@databricks.com> >> > wrote: >> > >> >> On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger <c...@koeninger.org> >> >> wrote: >> >>> >> >>> Is there a reason in general not to push projections and predicates >> down >> >>> into the individual ParquetTableScans in a union? >> >>> >> >> >> >> This would be a great case to add to ColumnPruning. Would be awesome >> if >> >> you could open a JIRA or even a PR :) >> >> >> > >> > >> > >