Hi all, We just figured out Pig 0.9.1 doesn't drop those non-necessary fields asap, which really affects the performance. Though http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid that "As part of its optimizations Pig analyzes Pig Latin scripts and determines what fields in an input it needs at each step in the script. It uses this information to aggressively drop fields it no longer needs."
We also found that Pig casts the data into the types defined in the schema, which is usually unnecessary, as most of them will be soon dropped. To work around these, we have to manually drop those fields and remove the types in the schema, which are really not interesting. Jie
