kgyrtkirk commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r482123540
########## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveOnTezCostModel.java ########## @@ -89,22 +89,23 @@ public RelOptCost getAggregateCost(HiveAggregate aggregate) { } else { final RelMetadataQuery mq = aggregate.getCluster().getMetadataQuery(); // 1. Sum of input cardinalities - final Double rCount = mq.getRowCount(aggregate.getInput()); - if (rCount == null) { + final Double inputRowCount = mq.getRowCount(aggregate.getInput()); + final Double rowCount = mq.getRowCount(aggregate); + if (inputRowCount == null || rowCount == null) { return null; } // 2. CPU cost = sorting cost - final double cpuCost = algoUtils.computeSortCPUCost(rCount); + final double cpuCost = algoUtils.computeSortCPUCost(rowCount) + inputRowCount * algoUtils.getCpuUnitCost(); // 3. IO cost = cost of writing intermediary results to local FS + // cost of reading from local FS for transferring to GBy + // cost of transferring map outputs to GBy operator final Double rAverageSize = mq.getAverageRowSize(aggregate.getInput()); if (rAverageSize == null) { return null; } - final double ioCost = algoUtils.computeSortIOCost(new Pair<Double,Double>(rCount,rAverageSize)); + final double ioCost = algoUtils.computeSortIOCost(new Pair<Double, Double>(rowCount, rAverageSize)); Review comment: if we will be doing a 2 phase groupby: every mapper will do some grouping before it starts emitting; in case `iRC >> oRC` the mappers could eliminate a lot of rows ; and they will most likely utilize `O(oRC)` io this is an underestimation ; I wanted to multiply it with the number of mappers - but I don't think that's known at this point....I can add a config key for a fixed multiplier. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org