zabetak commented on code in PR #4755: URL: https://github.com/apache/hive/pull/4755#discussion_r1532149473
########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; + +public class QueryCompilationBreakdownSummary implements PrintSummary { + private static final String OPERATION_SUMMARY = "%-84s %9s"; + private static final String QUERY_COMPILATION_SUMMARY = "Query Compilation Summary"; + private static final String LINE_SEPARATOR = "line.separator"; + private static final String TOTAL_COMPILATION_TIME = "Total Compilation Time"; + private List<Pair<String, String>> compileSteps = new ArrayList<>(); + + private String format(String value, long number) { + return String.format(OPERATION_SUMMARY, value, number + "ms"); + } + + private void populateCompileStepNumbers() { Review Comment: The content is immutable so we don't need to populate the `compileSteps` on every call. The whole thing can be become static final and initialized only once. ########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; + +public class QueryCompilationBreakdownSummary implements PrintSummary { Review Comment: I just noticed that `PrintSummary` is a Tez interface so it doesn't seem appropriate for the use-case. It might have been part of my previous suggestion but I failed to notice then that it was tailored around Tez. Thinking a bit more on the topic it seems that the `QueryCompilationBreakdownSummary` could be implemented as a `Hook`. In particular, the `QueryLifeTimeHook` interface seems to be fit well for this use case. Two advantages of implementing this as a `QueryLifeTimeHook`: 1. we wouldn't need a new configuration property 2. we could easily test the summary feature via qtests ########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; + +public class QueryCompilationBreakdownSummary implements PrintSummary { + private static final String OPERATION_SUMMARY = "%-84s %9s"; + private static final String QUERY_COMPILATION_SUMMARY = "Query Compilation Summary"; + private static final String LINE_SEPARATOR = "line.separator"; + private static final String TOTAL_COMPILATION_TIME = "Total Compilation Time"; + private List<Pair<String, String>> compileSteps = new ArrayList<>(); + + private String format(String value, long number) { + return String.format(OPERATION_SUMMARY, value, number + "ms"); + } + + private void populateCompileStepNumbers() { + compileSteps.add(Pair.of(PerfLogger.PARSE, "1")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_RESOLVED_PARSETREE, "2")); + compileSteps.add(Pair.of(PerfLogger.LOGICALPLAN_AND_HIVE_OPERATOR_TREE, "3")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_LOGICAL_PLAN, "3.1")); + compileSteps.add(Pair.of(PerfLogger.PLAN_GENERATION, "3.1.1")); + compileSteps.add(Pair.of(PerfLogger.MV_REWRITE_FIELD_TRIMMER, "3.1.2")); + compileSteps.add(Pair.of(PerfLogger.REMOVING_SUBQUERY, "3.1.3")); + compileSteps.add(Pair.of(PerfLogger.DECORRELATION, "3.1.4")); + compileSteps.add(Pair.of(PerfLogger.VALIDATE_QUERY_MATERIALIZATION, "3.1.5")); + compileSteps.add(Pair.of(PerfLogger.PREJOIN_ORDERING, "3.1.6")); + compileSteps.add(Pair.of(PerfLogger.MV_REWRITING, "3.1.7")); + compileSteps.add(Pair.of(PerfLogger.JOIN_REORDERING, "3.1.8")); + compileSteps.add(Pair.of(PerfLogger.POSTJOIN_ORDERING, "3.1.9")); + compileSteps.add(Pair.of(PerfLogger.HIVE_SORT_PREDICATES, "3.1.10")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_OPERATOR_TREE, "3.2")); + compileSteps.add(Pair.of(PerfLogger.DEDUCE_RESULTSET_SCHEMA, "4")); + compileSteps.add(Pair.of(PerfLogger.PARSE_CONTEXT_GENERATION, "5")); + compileSteps.add(Pair.of(PerfLogger.SAVE_AND_VALIDATE_VIEW, "6")); + compileSteps.add(Pair.of(PerfLogger.LOGICAL_OPTIMIZATION, "7")); + compileSteps.add(Pair.of(PerfLogger.PHYSICAL_OPTIMIZATION, "8")); + compileSteps.add(Pair.of(PerfLogger.POST_PROCESSING, "9")); + } + + @Override + public void print(SessionState.LogHelper console) { + console.printInfo(getCompileSummary(console)); + } + + public String getCompileSummary(SessionState.LogHelper console) { Review Comment: The parameter to the method is not used. ########## common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java: ########## @@ -90,9 +118,10 @@ public class PerfLogger { public static final String HIVE_GET_NOT_NULL_CONSTRAINT = "getNotNullConstraints"; public static final String HIVE_GET_TABLE_CONSTRAINTS = "getTableConstraints"; - protected final Map<String, Long> startTimes = new ConcurrentHashMap<>(); + protected final Map<String, Long> startTimes = Collections.synchronizedMap(new LinkedHashMap<>()); Review Comment: I am afraid that using `Collections.synchronizedMap` here breaks thread safety (reintroduce HIVE-27733). When using synchronized wrapper it is imperative (as described also in the Javadoc) to synchronize explicitly during iteration. Either we review all usages of `startTimes` and add external synchronize when necessary or we revert back to `ConcurrentHashMap`. ########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; + +public class QueryCompilationBreakdownSummary implements PrintSummary { + private static final String OPERATION_SUMMARY = "%-84s %9s"; + private static final String QUERY_COMPILATION_SUMMARY = "Query Compilation Summary"; + private static final String LINE_SEPARATOR = "line.separator"; + private static final String TOTAL_COMPILATION_TIME = "Total Compilation Time"; + private List<Pair<String, String>> compileSteps = new ArrayList<>(); + + private String format(String value, long number) { + return String.format(OPERATION_SUMMARY, value, number + "ms"); + } + + private void populateCompileStepNumbers() { + compileSteps.add(Pair.of(PerfLogger.PARSE, "1")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_RESOLVED_PARSETREE, "2")); + compileSteps.add(Pair.of(PerfLogger.LOGICALPLAN_AND_HIVE_OPERATOR_TREE, "3")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_LOGICAL_PLAN, "3.1")); + compileSteps.add(Pair.of(PerfLogger.PLAN_GENERATION, "3.1.1")); + compileSteps.add(Pair.of(PerfLogger.MV_REWRITE_FIELD_TRIMMER, "3.1.2")); + compileSteps.add(Pair.of(PerfLogger.REMOVING_SUBQUERY, "3.1.3")); + compileSteps.add(Pair.of(PerfLogger.DECORRELATION, "3.1.4")); + compileSteps.add(Pair.of(PerfLogger.VALIDATE_QUERY_MATERIALIZATION, "3.1.5")); + compileSteps.add(Pair.of(PerfLogger.PREJOIN_ORDERING, "3.1.6")); + compileSteps.add(Pair.of(PerfLogger.MV_REWRITING, "3.1.7")); + compileSteps.add(Pair.of(PerfLogger.JOIN_REORDERING, "3.1.8")); + compileSteps.add(Pair.of(PerfLogger.POSTJOIN_ORDERING, "3.1.9")); + compileSteps.add(Pair.of(PerfLogger.HIVE_SORT_PREDICATES, "3.1.10")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_OPERATOR_TREE, "3.2")); + compileSteps.add(Pair.of(PerfLogger.DEDUCE_RESULTSET_SCHEMA, "4")); + compileSteps.add(Pair.of(PerfLogger.PARSE_CONTEXT_GENERATION, "5")); + compileSteps.add(Pair.of(PerfLogger.SAVE_AND_VALIDATE_VIEW, "6")); + compileSteps.add(Pair.of(PerfLogger.LOGICAL_OPTIMIZATION, "7")); + compileSteps.add(Pair.of(PerfLogger.PHYSICAL_OPTIMIZATION, "8")); + compileSteps.add(Pair.of(PerfLogger.POST_PROCESSING, "9")); + } + + @Override + public void print(SessionState.LogHelper console) { + console.printInfo(getCompileSummary(console)); + } + + public String getCompileSummary(SessionState.LogHelper console) { + StringBuilder compileSummary = new StringBuilder(); + populateCompileStepNumbers(); + compileSummary.append(QUERY_COMPILATION_SUMMARY); + compileSummary.append(System.getProperty(LINE_SEPARATOR)); + compileSummary.append(SEPARATOR); + compileSummary.append(System.getProperty(LINE_SEPARATOR)); + PerfLogger perfLogger = SessionState.getPerfLogger(); + for (Pair<String, String> compileStep : compileSteps) { + compileSummary.append(format(PerfLogger.COMPILE_STEP + " - " + + compileStep.getRight() + " " + compileStep.getLeft(), + perfLogger.getDuration(compileStep.getLeft()))); + compileSummary.append(System.getProperty(LINE_SEPARATOR)); + } + compileSummary.append(format(TOTAL_COMPILATION_TIME, perfLogger.getDuration(PerfLogger.COMPILE))); + compileSummary.append(System.getProperty(LINE_SEPARATOR)); + compileSummary.append(SEPARATOR); + return compileSummary.toString(); + } +} Review Comment: nit: no newline at the end of file ########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; Review Comment: nit: Using the SEPARATOR from `org.apache.hadoop.hive.common.log.InPlaceUpdate#SEPARATOR` seems more appropriate in terms of package dependencies Anyways this is minor even if the text was not there at all that would be fine with me. ########## ql/src/java/org/apache/hadoop/hive/ql/QueryCompilationBreakdownSummary.java: ########## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql; + +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hive.ql.exec.tez.monitoring.PrintSummary; +import org.apache.hadoop.hive.ql.log.PerfLogger; +import org.apache.hadoop.hive.ql.session.SessionState; + +import java.util.ArrayList; +import java.util.List; + +import static org.apache.hadoop.hive.ql.exec.tez.monitoring.Constants.SEPARATOR; + +public class QueryCompilationBreakdownSummary implements PrintSummary { + private static final String OPERATION_SUMMARY = "%-84s %9s"; + private static final String QUERY_COMPILATION_SUMMARY = "Query Compilation Summary"; + private static final String LINE_SEPARATOR = "line.separator"; + private static final String TOTAL_COMPILATION_TIME = "Total Compilation Time"; + private List<Pair<String, String>> compileSteps = new ArrayList<>(); + + private String format(String value, long number) { + return String.format(OPERATION_SUMMARY, value, number + "ms"); + } + + private void populateCompileStepNumbers() { + compileSteps.add(Pair.of(PerfLogger.PARSE, "1")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_RESOLVED_PARSETREE, "2")); + compileSteps.add(Pair.of(PerfLogger.LOGICALPLAN_AND_HIVE_OPERATOR_TREE, "3")); + compileSteps.add(Pair.of(PerfLogger.GENERATE_LOGICAL_PLAN, "3.1")); + compileSteps.add(Pair.of(PerfLogger.PLAN_GENERATION, "3.1.1")); + compileSteps.add(Pair.of(PerfLogger.MV_REWRITE_FIELD_TRIMMER, "3.1.2")); Review Comment: The hardcoded string numbering may be hard to maintain if we decide to add or remove steps in the future since it will require manually updating the counters. Using a dedicated structure for keeping the steps could alleviate this by making the implicit nesting explicit. For instance: ```java private static final CompileStep { private final String name; private final List<CompileStep> substeps; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org