kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482123540



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
##########
@@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate) 
{
     } else {
       final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery();
       // 1. Sum of input cardinalities
-      final Double rCount = mq.getRowCount(aggregate.getInput());
-      if (rCount == null) {
+      final Double inputRowCount = mq.getRowCount(aggregate.getInput());
+      final Double rowCount = mq.getRowCount(aggregate);
+      if (inputRowCount == null || rowCount == null) {
         return null;
       }
       // 2. CPU cost = sorting cost
-      final double cpuCost = algoUtils.computeSortCPUCost(rCount);
+      final double cpuCost = algoUtils.computeSortCPUCost(rowCount) + 
inputRowCount * algoUtils.getCpuUnitCost();
       // 3. IO cost = cost of writing intermediary results to local FS +
       //              cost of reading from local FS for transferring to GBy +
       //              cost of transferring map outputs to GBy operator
       final Double rAverageSize = mq.getAverageRowSize(aggregate.getInput());
       if (rAverageSize == null) {
         return null;
       }
-      final double ioCost = algoUtils.computeSortIOCost(new 
Pair<Double,Double>(rCount,rAverageSize));
+      final double ioCost = algoUtils.computeSortIOCost(new Pair<Double, 
Double>(rowCount, rAverageSize));

Review comment:
       if we will be doing a 2 phase groupby: every mapper will do some 
grouping before it starts emitting; in case `iRC >> oRC` the mappers could 
eliminate a lot of rows ; and they will most likely utilize `O(oRC)` io
   
   this is an underestimation ; I wanted to multiply it with the number of 
mappers - but I don't think that's known at this point....I can add a config 
key for a fixed multiplier.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to