jcamachor commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482301409



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java
##########
@@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate) 
{
     } else {
       final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery();
       // 1. Sum of input cardinalities
-      final Double rCount = mq.getRowCount(aggregate.getInput());
-      if (rCount == null) {
+      final Double inputRowCount = mq.getRowCount(aggregate.getInput());
+      final Double rowCount = mq.getRowCount(aggregate);
+      if (inputRowCount == null || rowCount == null) {
         return null;
       }
       // 2. CPU cost = sorting cost
-      final double cpuCost = algoUtils.computeSortCPUCost(rCount);
+      final double cpuCost = algoUtils.computeSortCPUCost(rowCount) + 
inputRowCount * algoUtils.getCpuUnitCost();

Review comment:
       I think the problem is that we are trying to encapsulate here the 
algorithm selection too: The fact that we are grouping in each node before 
sorting the data (I think this is also somehow reflected in the `isLe` 
discussion above). However, that is not represented with precision by current 
model, since output rows is supposed to be the output of the final step in the 
aggregation.
   Wrt read, there is also the IO part of the cost, I am trying to understand 
whether some of the cost representation that you are talking about is IO.
   There is some more info about the original formulas that were used to 
compute this here: 
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive
   Can we split this into two patches and have the changes to the cost model on 
their own? This should also help to discuss this in more detail.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to