[ 
https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359126#comment-14359126
 ] 

William Watson commented on PIG-4458:
-------------------------------------

I should explain, a little bit. Right now, you can run a foreach that changes 
the placement of the join key. This also messes with the results you get, but 
this validation doesn't check that. 

IMO, we should probably just document that one shouldn't run a UDF on a JOIN 
key or change the placement of a JOIN key and let UDFs work here by removing 
the !containsUDFs requirement.

> Support UDFs in a FOREACH Before a Merge Join
> ---------------------------------------------
>
>                 Key: PIG-4458
>                 URL: https://issues.apache.org/jira/browse/PIG-4458
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: William Watson
>
> Right now, the MapSideMergeValidator outright rejects any foreach that has a 
> UDF in it:
> {code}
> private boolean isAcceptableForEachOp(Operator lo) throws 
> LogicalToPhysicalTranslatorException {
>         if (lo instanceof LOForEach) {
>             OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan();
>             validateMapSideMerge(innerPlan.getSinks(), innerPlan);
>             return !containsUDFs((LOForEach) lo);
>         } else {
>             return false;
>         }
>     }
> {code}
> There is a TODO for this later on in that same class (inside containsUDFs):
> {code}
> // TODO (dvryaboy): in the future we could relax this rule by tracing what 
> fields
> // are being passed into the UDF, and only refusing if the UDF is working on 
> the
> // join key. Transforms of other fields should be ok.
> {code}
> We should do the TODO and relax this requirement or just remove it altogether



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to