I think usually people add these directories as multiple partitions of the same table instead of union. This actually allows us to efficiently prune directories when reading in addition to standard column pruning.
On Tue, Sep 9, 2014 at 11:26 AM, Gary Malouf <[email protected]> wrote: > I'm kind of surprised this was not run into before. Do people not > segregate their data by day/week in the HDFS directory structure? > > > On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust <[email protected]> > wrote: > >> Thanks! >> >> On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger <[email protected]> >> wrote: >> >> > Opened >> > >> > https://issues.apache.org/jira/browse/SPARK-3462 >> > >> > I'll take a look at ColumnPruning and see what I can do >> > >> > On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust < >> [email protected]> >> > wrote: >> > >> >> On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger <[email protected]> >> >> wrote: >> >>> >> >>> Is there a reason in general not to push projections and predicates >> down >> >>> into the individual ParquetTableScans in a union? >> >>> >> >> >> >> This would be a great case to add to ColumnPruning. Would be awesome >> if >> >> you could open a JIRA or even a PR :) >> >> >> > >> > >> > >
