----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32036/#review78099 -----------------------------------------------------------
Thanks, Praveen! Looks good to me. I have some comments about error handling. src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java <https://reviews.apache.org/r/32036/#comment126549> you may want to leave it private, and add a public getter instead. Also, pig-core changes like this needs to go in a separate patch, right ? src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java <https://reviews.apache.org/r/32036/#comment126545> We should check the return status for lrOut. Plus, detaching the input after getNextTuple() call is a good practice: See POMergeJoin.extractKeysFromTuple(): lr.detachInput(); if(lrOut.returnStatus!=POStatus.STATUS_OK){ int errCode = 2167; String errMsg = "LocalRearrange used to extract keys from tuple isn't configured correctly"; throw new ExecException(...); } src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java <https://reviews.apache.org/r/32036/#comment126546> Let's not eat up these exceptions. Please log an error and re-throw the exception. LOG.error(msg + e, e) throw new ExecException(msg, e) src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java <https://reviews.apache.org/r/32036/#comment126548> Same here. Log and re-throw. src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java <https://reviews.apache.org/r/32036/#comment126550> i never quite understood what these error codes mean or a central place where these are defined... - Mohit Sabharwal On March 13, 2015, 11:18 a.m., Praveen Rachabattuni wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/32036/ > ----------------------------------------------------------- > > (Updated March 13, 2015, 11:18 a.m.) > > > Review request for pig, liyun zhang and Mohit Sabharwal. > > > Bugs: PIG-4422 > https://issues.apache.org/jira/browse/PIG-4422 > > > Repository: pig-git > > > Description > ------- > > POMergeJoin operator is added as parent to load operators and a regular join > is performed as part of the initial implementation and the MergeJoinConverter > should later be modified to achieve the specialized join. > > TODO: > - Perform join considering the input data is sorted. > - Fix failing test cases in TestMergeJoin > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java > 87249e4c9d6c890e8ac864c3faea32e3d6aa872d > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > ca7a45f33320064e22628b40b34be7b9f7b07c36 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/32036/diff/ > > > Testing > ------- > > Tested TestMergeJoin and we now have all tests passing except the following: > - testMergeJoinWithCommaSeperatedFilePaths > - testMergeJoinEmptyIndex > - testMergeJoinOutPipeline > - testExpressionFail > > > Thanks, > > Praveen Rachabattuni > >
