Yes, in theory filter should pushed above foreach. I don't know what happen, the easiest way is do an explain and let's check the plan.
Daniel On Fri, Mar 15, 2013 at 11:32 AM, Jeff Yuan <[email protected]> wrote: > Yes, I do use AS in the load statement. I thought Filters are always > pushed as close to the Load operators as possible? What kind of > Foreach is added? > > Thanks, > Jeff > > On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai <[email protected]> wrote: >> getPartitionKeys should be called by default. Did you use "AS" clause >> in load statement? That could add a foreach between Load and Filter, >> and getPartitionKeys will only be invoked if filter is right after >> load. Do an explain to check for it. >> >> Thanks, >> Daniel >> >> On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan <[email protected]> wrote: >>> Hi all, >>> >>> For CustomLoader (a class I'm implementing) which extends LoadFunct, >>> implemented LoadMetadata, the "getPartitionKeys" function is supposed >>> to be called by "PartitionFilterOptimizer", right? I put some debug >>> statements in "getPartitionKeys", but this function doesn't seem like >>> it's ever called. >>> >>> I've read through some Pig source, optimization rules can be disabled >>> by properties, but by default the "PartitionFilterOptimizer" should be >>> enabled. Also, in "PartitionFilterOptimizer", I saw checks to saw some >>> other checks, like the Filter operator cannot have another dependency >>> other than load, which is true in my case. Anyway, can someone shed >>> some light on this? Am I understanding this interface incorrectly? >>> >>> My script is very simple (line 1 is load, line 2 is filter, and line 3 >>> is store), so the Logical Plan should be very simple. Also, I'm >>> testing this in Pig local mode, not sure if that matters. >>> >>> Greatly appreciate any hints!
