jcamachor commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r482301409
########## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java ########## @@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate) { } else { final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery(); // 1. Sum of input cardinalities - final Double rCount = mq.getRowCount(aggregate.getInput()); - if (rCount == null) { + final Double inputRowCount = mq.getRowCount(aggregate.getInput()); + final Double rowCount = mq.getRowCount(aggregate); + if (inputRowCount == null || rowCount == null) { return null; } // 2. CPU cost = sorting cost - final double cpuCost = algoUtils.computeSortCPUCost(rCount); + final double cpuCost = algoUtils.computeSortCPUCost(rowCount) + inputRowCount * algoUtils.getCpuUnitCost(); Review comment: I think the problem is that we are trying to encapsulate here the algorithm selection too: The fact that we are grouping in each node before sorting the data (I think this is also somehow reflected in the `isLe` discussion above). However, that is not represented with precision by current model, since output rows is supposed to be the output of the final step in the aggregation. Wrt read, there is also the IO part of the cost, I am trying to understand whether some of the cost representation that you are talking about is IO. There is some more info about the original formulas that were used to compute this here: https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive Can we split this into two patches and have the changes to the cost model on their own? This should also help to discuss this in more detail. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org