himanshu-mishra commented on code in PR #5406: URL: https://github.com/apache/hive/pull/5406#discussion_r1731647606
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java: ########## @@ -870,6 +871,35 @@ private boolean checkConvertJoinSMBJoin(JoinOperator joinOp, OptimizeTezProcCont } } + /* As SMB replaces last RS op from the joining branches and the JOIN op with MERGEJOIN, we need to ensure + * the RS before these RS, in both branches, are partitioning using same hash generator. It + * differs depending on ReducerTraits.UNIFORM i.e. ReduceSinkOperator#computeMurmurHash or + * ReduceSinkOperator#computeHashCode, leading to different code for same value. Skip SMB join in such cases. + */ + Boolean prevRsHasUniformTrait = null; + for (Operator<? extends OperatorDesc> parentOp : joinOp.getParentOperators()) { + // Assertion of mandatory single parent is already being done in bucket version check earlier + Operator<?> op = parentOp.getParentOperators().get(0); Review Comment: Yes. The `ReduceSinkOperator-2 -> JoinOperator` get replaced by `MERGEJOIN` operator for SMB. We want to ensure the RS before these in both join branches are using same partitioning logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org