pig-user  

Re: Nested expressions in FOREACH vs FILTER

Alan Gates
Fri, 28 Mar 2008 10:07:52 -0700

I'm not clear on the semantics you're proposing for filter. I think what you're saying is that pig cannot apply a relation level conditional (instead of record level conditional) in a natural way.

To be clear, pig can do a record level conditional like:

c = foreach b generate ($0 > '1' ? $1 : $2);

But if you instead want to apply the conditional to the entire relation, we have to do something contorted (like the workaround you suggest). You'd like to be able to do something like:

c = b generate (any $0 > '1' ? $1 : $2);

where the 'any' operator is applied to all of $0 instead of being applied a row at a time.

Is that correct, or are you suggesting more than that? Or perhaps something altogether different?

Alan.

On Mar 27, 2008, at 3:37 PM, Mridul Muralidharan wrote:

Hi,

  FOREACH supports nested expressions of form :
var1 = FOREACH var { <expr>'s; GENERATE <tuple> }

Similar functionality does not seem to be available with FILTER.
That is, slightly complex filter expressions - particularly when we need to process the Bag/tuples contained as tuples of the relation in question is not possible.

Mirroring FOREACH functionality, something like this would be great :

var1 = FILTER var {
  t1 = <expr>;
  t2 = <expr>;
  ...
  BY (conds);
}


Workaround for the immediate problem I am facing is to use FOREACH to generate something like $status, <tuple> and then FILTER on $status.
Followed by another FOREACH to remove the status.

Regards,
Mridul