[
https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360988#comment-14360988
]
William Watson commented on PIG-4458:
-------------------------------------
Okay cool, just making sure we're on the same page with what I've done.
Let me make sure we're also on the same page with what all I'm going to do to
close this out.
I'm a little confused about where to add the documentation and what you mean by
"when an exception happens" because I don't think there will be an exception if
they mess with the join key. I think the records will be wrong.
What I am clear on is I'm adding a test into the TestMergeJoin called
testMergeJoinWithUDF that makes sure a UDF can be used in a foreach before a
merge join and still return correct results. That's what you want, right?
> Support UDFs in a FOREACH Before a Merge Join
> ---------------------------------------------
>
> Key: PIG-4458
> URL: https://issues.apache.org/jira/browse/PIG-4458
> Project: Pig
> Issue Type: New Feature
> Reporter: William Watson
> Attachments: remove_merge_join_udf_restriction.patch
>
>
> Right now, the MapSideMergeValidator outright rejects any foreach that has a
> UDF in it:
> {code}
> private boolean isAcceptableForEachOp(Operator lo) throws
> LogicalToPhysicalTranslatorException {
> if (lo instanceof LOForEach) {
> OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan();
> validateMapSideMerge(innerPlan.getSinks(), innerPlan);
> return !containsUDFs((LOForEach) lo);
> } else {
> return false;
> }
> }
> {code}
> There is a TODO for this later on in that same class (inside containsUDFs):
> {code}
> // TODO (dvryaboy): in the future we could relax this rule by tracing what
> fields
> // are being passed into the UDF, and only refusing if the UDF is working on
> the
> // join key. Transforms of other fields should be ok.
> {code}
> We should do the TODO and relax this requirement or just remove it altogether
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)