-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71766/
-----------------------------------------------------------
Review request for hive, Peter Vary and Slim Bouguerra.
Repository: hive-git
Description
-------
Recently I wanted to add some additional capability, and add more, performance
logging to support my troubleshooting efforts. I started looking at PerfLogger
and started to examine its usage. I discovered a few things:
Since 'loggers' must be open and closed manually, I found a couple of places
where loggers were opened, but not closed, rendering them useless
Since 'loggers' must be closed manually, I found a few places where an
early-return or Exception thrown would cause a logger to not be closed, thereby
rendering it useless
Session information is not logged, so it can be difficult to precisely pinpoint
which session is taking lots of time
PerfLogger overloaded. Most of the time, it's being used as a simple timer
mechanism with automatic logging in SLF4J debug. However, it is also a facade
over the Hive Metrics subsystem and timing results are automatically published
to Metrics and then there becomes this dependency on a 'logger' to be able to
access metric data as well.
The last bullet is the most challenging part and why I propose to deprecate the
Hive PerfLogger and not simply remove it. I am proposing a new system... a
PerfTimer that is allows for Java 8's try-with-resources feature to protect
against the developer having to care about manually close measurements and not
having to carefully consider all early-exits. The base implementation logs to
SLF4J. An extended version automatically publishes to the Hive Metric subsystem
as well.
The current Hive PerfLogger has a bit of a clunky system for allowing plugable
implementations. However, the current default implementation has a side-effect
of also publishing timing information to the Hive Metrics subsystem. There are
code sections that look up various timers in the Metrics Subsytem and publish
the results back to the client. Since, in theory, the implementation is
plugable, any other implementation that does not also have this side-effect of
also publishing to the Metrics Subsystem will break these non-optional code
paths. Also, these code paths create and interact with PerfLoggers in a static
way, and then the publishing code pulls the data from the {{PerfLogger}} (as a
facade to the Metrics subsystem) in a static way. Therefore, when I tried to
replace the entire PerfLogger code, I came across an issue because there is not
(and should not) be a way to just statically pull this information down from
any point in the code. Information that is required for publish
ing should be passed around within some sort of context object, separate from
the Metrics subsystem. There was no obvious way to string a new PerfTimer to
all the required locations. I propose marking the PerfLogger as deprecated and
leaving these complex section alone. Instead, replace only the simple "I want a
timer" use cases.
Diffs
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 282f4cdb0b
common/src/java/org/apache/hadoop/hive/ql/log/CachedPerfTimerLogger.java
PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/LoggingPerfTimerLogger.java
PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/MetricsPerfTimerLogger.java
PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 2707987f0b
common/src/java/org/apache/hadoop/hive/ql/log/PerfTimedAction.java
PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/PerfTimer.java PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerFactory.java
PRE-CREATION
common/src/java/org/apache/hadoop/hive/ql/log/PerfTimerLogger.java
PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 0643a54753
ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 695d08bbe2
ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
e205c08d84
ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java
10144a1352
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java a7770b4e53
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java
b9285accbd
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java
530131f207
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 8244dcb1a9
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
806deb5f31
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
f29a9f807c
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
07cb5cb936
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 92775107bc
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordProcessor.java
8c9d53f521
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java
13f5f12989
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java
6697f62d13
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java
03edbf7bdb
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java
72446afeda
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java fa6160fe3c
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java aecd1084e6
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
1f72477666
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java c97c961481
ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java dd25f622c7
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 6143e85664
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java
4592f5ec34
ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 25e9cd0482
ql/src/java/org/apache/hadoop/hive/ql/optimizer/Transform.java 6c57797177
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
673d8580d5
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 91ec00b9c6
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 7d5807720b
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
24429b4a1f
ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java e224f2c348
Diff: https://reviews.apache.org/r/71766/diff/1/
Testing
-------
Thanks,
David Mollitor