zabetak commented on code in PR #4755:
URL: https://github.com/apache/hive/pull/4755#discussion_r1444696611


##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -129,9 +133,31 @@ public static void setPerfLogger(PerfLogger 
resetPerfLogger) {
    * @param method method or ID that identifies this perf log element.
    */
   public void perfLogBegin(String callerName, String method) {
+    perfLogBegin(callerName, method, null);
+  }
+
+  /**
+   * Call this function when you start to measure time spent by a piece of 
code.
+   * @param callerName the logging object to be used.
+   * @param method method or ID that identifies this perf log element.
+   * @param additionalInfo particular step within the method that is being 
measured
+   */
+  public void perfLogBegin(String callerName, String method, String 
additionalInfo) {
     long startTime = System.currentTimeMillis();
-    startTimes.put(method, Long.valueOf(startTime));
-    LOG.debug("<PERFLOG method={} from={}>", method, callerName);
+    String key = method;
+    if (additionalInfo != null) {
+      key = key + " - " + additionalInfo;
+    }
+    startTimes.put(key, Long.valueOf(startTime));
+    if (LOG.isDebugEnabled()) {
+      StringBuilder sb = new StringBuilder("<PERFLOG method=").append(method);
+      sb.append(" from=").append(callerName);
+      if (additionalInfo != null) {
+        sb.append(" ").append(additionalInfo);
+      }
+      sb.append(">");
+      LOG.debug(sb.toString());
+    }
     beginMetrics(method);

Review Comment:
   It seems that the `PerfLogger` already tracks some metrics for each call on 
begin/end. Did you check if its possible to exploit the metrics to provide 
these kind of cumulative optimization times? Why do we keep these metrics are 
they useful in any way?
   
   `PerfLogger` is called in many places and also gets heavier and more 
involved every time. I want to ensure that we are not adding new information 
and logic that we don't already have somewhere.



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -92,6 +94,8 @@ public class PerfLogger {
 
   protected final Map<String, Long> startTimes = new ConcurrentHashMap<>();
   protected final Map<String, Long> endTimes = new ConcurrentHashMap<>();
+  protected final Map<String, Long> durations = new ConcurrentHashMap<>();

Review Comment:
   I see your point. However, the additional of the new information comes in 
conflict with the current APIs. For instance, we have `PerfLogger#getDuration` 
that uses the `startTimes` and `endTimes` data structures. Moreover, the 
start/end times are exposed to the WebUI via public getters. 
   
   Should we adapt some of the old APIs? Should we add new ones?



##########
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java:
##########
@@ -152,10 +178,18 @@ public long perfLogEnd(String callerName, String method) {
    * @return long duration  the difference between now and startTime, or -1 if 
startTime is null
    */
   public long perfLogEnd(String callerName, String method, String 
additionalInfo) {
-    long startTime = startTimes.getOrDefault(method, -1L);
+    String key = method;
+    if (additionalInfo != null) {
+      key = key + " - " + additionalInfo;
+    }
+    long startTime = startTimes.getOrDefault(key, -1L);
     long endTime = System.currentTimeMillis();
-    long duration = startTime < 0 ? -1 : endTime - startTime;
-    endTimes.put(method, Long.valueOf(endTime));
+    long duration = startTime < 0 ? -1 : (endTime - startTime + 
durations.getOrDefault(key, 0L));
+    endTimes.put(key, Long.valueOf(endTime));
+    durations.put(key, duration);
+    synchronized(querySteps) {
+      querySteps.add(key);

Review Comment:
   Isn't it possible to derive the order by looking at `startTimes`? 
   
   As you mentioned elsewhere sometimes the same key may appears multiple times 
during compilation. In that case there is no total order among the compilation 
events.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to