zabetak commented on code in PR #4755: URL: https://github.com/apache/hive/pull/4755#discussion_r1444696611
########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -129,9 +133,31 @@ public static void setPerfLogger(PerfLogger resetPerfLogger) { * @param method method or ID that identifies this perf log element. */ public void perfLogBegin(String callerName, String method) { + perfLogBegin(callerName, method, null); + } + + /** + * Call this function when you start to measure time spent by a piece of code. + * @param callerName the logging object to be used. + * @param method method or ID that identifies this perf log element. + * @param additionalInfo particular step within the method that is being measured + */ + public void perfLogBegin(String callerName, String method, String additionalInfo) { long startTime = System.currentTimeMillis(); - startTimes.put(method, Long.valueOf(startTime)); - LOG.debug("<PERFLOG method={} from={}>", method, callerName); + String key = method; + if (additionalInfo != null) { + key = key + " - " + additionalInfo; + } + startTimes.put(key, Long.valueOf(startTime)); + if (LOG.isDebugEnabled()) { + StringBuilder sb = new StringBuilder("<PERFLOG method=").append(method); + sb.append(" from=").append(callerName); + if (additionalInfo != null) { + sb.append(" ").append(additionalInfo); + } + sb.append(">"); + LOG.debug(sb.toString()); + } beginMetrics(method); Review Comment: It seems that the `PerfLogger` already tracks some metrics for each call on begin/end. Did you check if its possible to exploit the metrics to provide these kind of cumulative optimization times? Why do we keep these metrics are they useful in any way? `PerfLogger` is called in many places and also gets heavier and more involved every time. I want to ensure that we are not adding new information and logic that we don't already have somewhere. ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -92,6 +94,8 @@ public class PerfLogger { protected final Map<String, Long> startTimes = new ConcurrentHashMap<>(); protected final Map<String, Long> endTimes = new ConcurrentHashMap<>(); + protected final Map<String, Long> durations = new ConcurrentHashMap<>(); Review Comment: I see your point. However, the additional of the new information comes in conflict with the current APIs. For instance, we have `PerfLogger#getDuration` that uses the `startTimes` and `endTimes` data structures. Moreover, the start/end times are exposed to the WebUI via public getters. Should we adapt some of the old APIs? Should we add new ones? ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -152,10 +178,18 @@ public long perfLogEnd(String callerName, String method) { * @return long duration the difference between now and startTime, or -1 if startTime is null */ public long perfLogEnd(String callerName, String method, String additionalInfo) { - long startTime = startTimes.getOrDefault(method, -1L); + String key = method; + if (additionalInfo != null) { + key = key + " - " + additionalInfo; + } + long startTime = startTimes.getOrDefault(key, -1L); long endTime = System.currentTimeMillis(); - long duration = startTime < 0 ? -1 : endTime - startTime; - endTimes.put(method, Long.valueOf(endTime)); + long duration = startTime < 0 ? -1 : (endTime - startTime + durations.getOrDefault(key, 0L)); + endTimes.put(key, Long.valueOf(endTime)); + durations.put(key, duration); + synchronized(querySteps) { + querySteps.add(key); Review Comment: Isn't it possible to derive the order by looking at `startTimes`? As you mentioned elsewhere sometimes the same key may appears multiple times during compilation. In that case there is no total order among the compilation events. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org