[ 
https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359141#comment-14359141
 ] 

Brian Johnson commented on PIG-4458:
------------------------------------

I think it should be removed altogether. A FOREACH GENERATE that changed the 
position of the join key breaks the map side merge, but the validation doesn't 
reject it. Why strictly enforce one and not the other? isAcceptableSortOp has 
an incomplete check as well that is permissive instead of restrictive like the 
UDF check. I think it makes sense to 

// TODO: really, we should check that the sort is on the join keys, in the same 
order!

I think the main check makes sense and the isAcceptableForEachOp check makes 
sense, but isAcceptableSortOp and containsUDFs are either pointless or going 
too far

> Support UDFs in a FOREACH Before a Merge Join
> ---------------------------------------------
>
>                 Key: PIG-4458
>                 URL: https://issues.apache.org/jira/browse/PIG-4458
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: William Watson
>
> Right now, the MapSideMergeValidator outright rejects any foreach that has a 
> UDF in it:
> {code}
> private boolean isAcceptableForEachOp(Operator lo) throws 
> LogicalToPhysicalTranslatorException {
>         if (lo instanceof LOForEach) {
>             OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan();
>             validateMapSideMerge(innerPlan.getSinks(), innerPlan);
>             return !containsUDFs((LOForEach) lo);
>         } else {
>             return false;
>         }
>     }
> {code}
> There is a TODO for this later on in that same class (inside containsUDFs):
> {code}
> // TODO (dvryaboy): in the future we could relax this rule by tracing what 
> fields
> // are being passed into the UDF, and only refusing if the UDF is working on 
> the
> // join key. Transforms of other fields should be ok.
> {code}
> We should do the TODO and relax this requirement or just remove it altogether



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to