It doesn't seem like I'm able to call a UDF to determine the value of my partition filter condition. For example, I'd like to do this within a Pig MACRO:
DEFINE load_recent_signals(days, end_timebucket) return RECENT_SIGNALS { signals = load 'signals' using org.apache.hcatalog.pig.HCatLoader(); $RECENT_SIGNALS = foreach ( filter signals by ( datetime_partition >= TimebucketToDatePartition($end_timebucket - (86400000L*$num_days)) AND datetime_partition <= TimebucketToDatePartition($end_timebucket) AND relationship_id IS NOT NULL )) { generate ...; }; }; The TimebucketToDatePartition is a UDF that determines the partition value (a STRING) based on a timestamp (LONG). When I run this, I get the error that the filter couldn't be "pushed" into the load, which makes partitioning worthless. I have big data so partitioning is VERY important. Of course, I also tried evaluating the UDFs when I call in the MACRO, but of course the Pig grammar is so limited that it doesn't recognize UDF calls to determine parameter values, i.e. signals_in = load_recent_signals(TimebucketToDatePartition(1351612800000L), TimebucketToDatePartition(1351785600000L)); This results in error: ERROR 1200: <line 5, column 58> mismatched input '(' expecting RIGHT_PAREN So I'm at a loss as to what I can do here. Seems like evaluating a UDF for a partition filter is a sensical thing to do with HCatalog and Pig. I'm willing to crack open the code and fix this if someone can provide some advice on how to go about this issue, i.e. should I try to fix the Pig grammar to allow UDFs to be called when evaluating MACRO parameters or try to fix the HCatalog side to allow me to call a UDF to determine filter conditions. <rant>So far, I've had nothing but trouble with HCatalog and filtering by partition keys in Pig. Isn't this one of the the primary use cases of HCatalog?</rant>