Why do joins prevent the early projection? Actually join has the greatest need for it.
Jie On Fri, Dec 2, 2011 at 7:33 PM, Jonathan Coveney <[email protected]> wrote: > In what context? I always thought that it generally could, but that if you > do joins it doesn't. Would be curious to know more from someone who > knows... > > 2011/12/2 Jie Li <[email protected]> > > > Hi all, > > > > We just figured out Pig 0.9.1 doesn't drop those non-necessary fields > asap, > > which really affects the performance. Though > > > > > http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid > > that "As part of its optimizations Pig analyzes Pig Latin scripts and > > determines what fields in an input it needs at each step in the script. > It > > uses this information to aggressively drop fields it no longer needs." > > > > We also found that Pig casts the data into the types defined in the > schema, > > which is usually unnecessary, as most of them will be soon dropped. > > > > To work around these, we have to manually drop those fields and remove > the > > types in the schema, which are really not interesting. > > > > Jie > > >
