shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r495719674
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##########
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf,
ReduceSinkOperator cRS, ReduceSin
TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new
ArrayList<FieldSchema>(), pRS
.getConf().getOrder(), pRS.getConf().getNullOrder());
pRS.getConf().setKeySerializeInfo(keyTable);
+ } else if (cRS.getConf().getKeyCols() != null &&
cRS.getConf().getKeyCols().size() > 0) {
+ ArrayList<String> keyColNames = Lists.newArrayList();
+ for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) {
+ String keyColName = keyCol.getExprString();
+ keyColNames.add(keyColName);
+ }
+ List<FieldSchema> fields =
PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(),
+ keyColNames, 0, "");
+ TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields,
pRS.getConf().getOrder(),
+ pRS.getConf().getNullOrder());
+ ArrayList<String> outputKeyCols = Lists.newArrayList();
+ for (int i = 0; i < fields.size(); i++) {
+ outputKeyCols.add(fields.get(i).getName());
+ }
+ pRS.getConf().setOutputKeyColumnNames(outputKeyCols);
+ pRS.getConf().setKeySerializeInfo(keyTable);
+
pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys());
}
Review comment:
Just to add more context here, Number of distribution keys of cRS is
chosen only when numDistKeys of pRS is 0 or less. In all other cases,
distribution of the keys is based on the pRS which is more generic than cRS. We
will enter this "if" condition only in two cases
1. pRS keyCol is empty and cRS keyCol is empty
2. pRS keyCol is empty and cRS keyCol is not empty
So in case I we would like to keep the pRS properties intact since pRS is
more generic. In case (2) we want to go with cRS properties hence i think
returning false is not required.
Does this make sense? Or am i missing any thing?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]