zabetak commented on code in PR #4755: URL: https://github.com/apache/hive/pull/4755#discussion_r1440459341
########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -92,6 +94,8 @@ public class PerfLogger { protected final Map<String, Long> startTimes = new ConcurrentHashMap<>(); protected final Map<String, Long> endTimes = new ConcurrentHashMap<>(); + protected final Map<String, Long> durations = new ConcurrentHashMap<>(); Review Comment: Why do we need a separate data structure for duration? If we have startTime and endTime for every distinct entry that should be enough. ########## ql/src/java/org/apache/hadoop/hive/ql/Compiler.java: ########## @@ -504,6 +506,15 @@ private void cleanUp(Throwable compileException, boolean parsed, boolean deferCl driverState.compilationFinishedWithLocking(compileException != null); LOG.info("Completed compiling command(queryId={}); Time taken: {} seconds", driverContext.getQueryId(), duration); + if (HiveConf.getBoolVar(driverContext.getConf(), HiveConf.ConfVars.HIVE_COMPILE_SUMMARY)) { + LogHelper console = SessionState.getConsole(); + console.printInfo("Query Compilation Summary"); + console.printInfo(SEPARATOR); + for (String qs : perfLogger.getQuerySteps()) { + console.printInfo(String.format("%-99s %9s", qs, perfLogger.getDurations().get(qs) + " ms")); + } + console.printInfo(SEPARATOR); + } Review Comment: The `Query Execution Summary` is handled in a subclass of `PrintSummary` shouldn't we encapsulate this logic in a similar fashion? ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -152,10 +178,18 @@ public long perfLogEnd(String callerName, String method) { * @return long duration the difference between now and startTime, or -1 if startTime is null */ public long perfLogEnd(String callerName, String method, String additionalInfo) { - long startTime = startTimes.getOrDefault(method, -1L); + String key = method; + if (additionalInfo != null) { + key = key + " - " + additionalInfo; + } + long startTime = startTimes.getOrDefault(key, -1L); long endTime = System.currentTimeMillis(); - long duration = startTime < 0 ? -1 : endTime - startTime; - endTimes.put(method, Long.valueOf(endTime)); + long duration = startTime < 0 ? -1 : (endTime - startTime + durations.getOrDefault(key, 0L)); + endTimes.put(key, Long.valueOf(endTime)); + durations.put(key, duration); + synchronized(querySteps) { + querySteps.add(key); Review Comment: The `querySteps` data structure and this code seems redundant since the same `key` seems to be present in `startTimes`, `endTimes`, and `durations`. ########## ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java: ########## @@ -76,8 +76,8 @@ * Handles transaction related duties of the Driver. */ class DriverTxnHandler { - private static final String CLASS_NAME = Driver.class.getName(); - private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName(); Review Comment: Why did we switch from full name to simple name? I suppose it is for readability purposes but this may have undesirable side effects so maybe we should defer this part to another PR. ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -129,9 +133,31 @@ public static void setPerfLogger(PerfLogger resetPerfLogger) { * @param method method or ID that identifies this perf log element. */ public void perfLogBegin(String callerName, String method) { + perfLogBegin(callerName, method, null); + } + + /** + * Call this function when you start to measure time spent by a piece of code. + * @param callerName the logging object to be used. + * @param method method or ID that identifies this perf log element. + * @param additionalInfo particular step within the method that is being measured + */ + public void perfLogBegin(String callerName, String method, String additionalInfo) { long startTime = System.currentTimeMillis(); - startTimes.put(method, Long.valueOf(startTime)); - LOG.debug("<PERFLOG method={} from={}>", method, callerName); + String key = method; + if (additionalInfo != null) { + key = key + " - " + additionalInfo; Review Comment: It doesn't seem that `additionalInfo` was ever meant to be used as a key. Instead of concatenating the key with the info in order to obtain a unique id maybe it would be better that the caller passes the appropriate measurement event in the first place. This class has many static constants for the various events that we want to track but as far as I understand we want to have more fine grained measurements. For instance, it seems that we want to track `PerfLogger.OPTIMIZER` in more detail. If that's the case then possibly it makes sense to introduce more entries for the various sub-steps. ########## ql/src/java/org/apache/hadoop/hive/ql/Driver.java: ########## @@ -71,8 +71,8 @@ */ public class Driver implements IDriver { - private static final String CLASS_NAME = Driver.class.getName(); - private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + private static final String SIMPLE_CLASS_NAME = Driver.class.getName(); Review Comment: The rename is confusing since we still call `Class#getName` method. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: ########## @@ -1995,7 +1998,8 @@ protected RelNode applyMaterializedViewRewriting(RelOptPlanner planner, RelNode return calcitePreMVRewritingPlan; } - perfLogger.perfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER); + perfLogger.perfLogBegin(this.getClass().getSimpleName(), PerfLogger.OPTIMIZER + , "Calcite: View-based rewriting"); Review Comment: Instead of passing additional info in begin maybe we should decompose the `PerfLogger.OPTIMIZER` entry into multiple ones. ########## ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java: ########## @@ -76,8 +76,8 @@ * Handles transaction related duties of the Driver. */ class DriverTxnHandler { - private static final String CLASS_NAME = Driver.class.getName(); - private static final Logger LOG = LoggerFactory.getLogger(CLASS_NAME); + private static final String SIMPLE_CLASS_NAME = Driver.class.getSimpleName(); + private static final Logger LOG = LoggerFactory.getLogger(SIMPLE_CLASS_NAME); Review Comment: If we change the logger name then we may have to change the log4j configuration files. Is this a breaking change? Do we need to do it? ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -207,6 +241,14 @@ public Map<String, Long> getStartTimes() { return ImmutableMap.copyOf(startTimes); } + public Set<String> getQuerySteps() { + return querySteps; + } + + public Map<String, Long> getDurations() { + return durations; + } + Review Comment: Returning modifiable data structures especially when we are dealing with thread safe classes is not a good idea. ########## ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java: ########## @@ -1932,7 +1932,7 @@ public List<String> getTablesForDb(String database, String tablePattern) public List<String> getTablesByType(String dbName, String pattern, TableType type) throws HiveException { PerfLogger perfLogger = SessionState.getPerfLogger(); - perfLogger.perfLogBegin(CLASS_NAME, PerfLogger.HIVE_GET_TABLE); + perfLogger.perfLogBegin(SIMPLE_CLASS_NAME, PerfLogger.HIVE_GET_TABLE, "HS2-cache"); Review Comment: The `PerfLogger.HIVE_GET_TABLE` entry is used only here so I am not sure if we really need to pass the additional info (`HS2-cache`). The same comment applies to other places where we pass something in `perfLogBegin`. ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -92,6 +94,8 @@ public class PerfLogger { protected final Map<String, Long> startTimes = new ConcurrentHashMap<>(); Review Comment: Should we add some tests in TestPerfLogger, TestQueryDisplay, or somewhere else to ensure that the data structures are populated correctly when the query compiles? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org