[
https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Mollitor updated HIVE-22402:
----------------------------------
Description:
Recently I wanted to add some additional capability, and add more, performance
logging to support my troubleshooting efforts. I started looking at PerfLogger
and started to examine its usage. I discovered a few things:
# Since 'loggers' must be open and closed manually, I found a couple of places
where loggers were opened, but not closed, rendering them useless
# Since 'loggers' must be closed manually, I found a few places where an
early-return or Exception thrown would cause a logger to not be closed, thereby
rendering it useless
# Session information is not logged, so it can be difficult to precisely
pinpoint which session is taking lots of time
# PerfLogger overloaded. Most of the time, it's being used as a simple timer
mechanism with automatic logging in SLF4J debug. However, it is also a facade
over the Hive Metrics subsystem and timing results are automatically published
to Metrics and then there becomes this dependency on a 'logger' to be able to
access metric data as well.
The last bullet is the most challenging part and why I propose to deprecate the
Hive {{PerfLogger}} and not simply remove it. I am proposing a new system... a
{{PerfTimer}} that is allows for Java 8's try-with-resources feature to protect
against the developer having to care about manually close measurements and not
having to carefully consider all early-exits. The base implementation logs to
SLF4J. An extended version automatically publishes to the Hive Metric subsystem
as well.
The Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable
implementations. However, there are sections of the code that rely on the
{{PerfLogger}}'s ability to publish to the Metrics subsystem. These code
section look up various timers in the Metrics Subsytem and publish the results
back to the client. Since, in theory, the implementation is plugable, any other
implementation that does not publish to the Metrics Subsystem will break these
code paths. Also, these code paths create and interact with {{PerfLogger}}s in
a static way, and then the publishing code pulls the data from the
{{PerfLogger}} (as a facade to the Metrics subsystem) in a static way.
Therefore, when I tried to replace the entire {{PerfLogger}} code, I came
across an issue because there is not (and should not) be a way to just
statically pull this information down from any point in the code. Information
that is required for publishing should be passed around within some sort of
context object, separate from the Metrics subsystem. There was no obvious way
to string a new {{PerfTimer}} to all the required locations. I propose marking
the {{PerfLogger}} as deprecated and leaving these complex section alone.
Instead, replace only the simple "I want a timer" use cases.
was:
Recently I wanted to add some additional capability, and add more, performance
logging to support my troubleshooting efforts. I started looking at PerfLogger
and started to examine its usage. I discovered a few things:
# Since 'loggers' must be open and closed manually, I found a couple of places
where loggers were opened, but not closed, rendering them useless
# Since 'loggers' must be closed manually, I found a few places where an
early-return or Exception thrown would cause a logger to not be closed, thereby
rendering it useless
# Session information is not logged, so it can be difficult to precisely
pinpoint which session is taking lots of time
# PerfLogger overloaded. Most of the time, it's being used as a simple timer
mechanism with automatic logging in SLF4J debug. However, it is also a facade
over the Hive Metrics subsystem and timing results are automatically published
to Metrics and then there becomes this dependency on a 'logger' to be able to
access metric data as well.
The last bullet is the most challenging part and why I propose to deprecate the
Hive {{PerfLogger}} and not simply remove it. I am proposing a new system... a
{{PerfTimer}} that is allows for Java 8's try-with-resources feature to protect
against the developer having to care about manually close measurements and not
having to carefully consider all early-exits. The base implementation logs to
SLF4J. An extended version automatically publishes to the Hive Metric subsystem
as well.
The Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable
implementations. However, there are parts of the code that rely on the
{{PerfLogger}}'s ability to publish to the Metrics subsystem. They look up
various timers in the Metrics Subsytem and publish the results back to the
client. Since, in theory, the implementation is plugable, any other
implementation that does not publish to the Metrics Subsystem will break these
code paths. Also, these code paths create and interact with {{PerfLogger}}s in
a static way, and then the publishing code pulls the data from the
{{PerfLogger}} (as a facade to the Metrics subsystem) in a static way.
Therefore, when I tried to replace the entire {{PerfLogger}} code, I came
across an issue because there is not (and should not) be a way to just
statically pull this information down from any point in the code. Information
that is required for publishing should be passed around within some sort of
context object, separate from the Metrics subsystem. There was no obvious way
to string a new {{PerfTimer}} to all the required locations. I propose marking
the {{PerfLogger}} as deprecated and leaving these complex section alone.
Instead, replace only the simple "I want a timer" use cases.
> Deprecate Hive PerfLogger
> -------------------------
>
> Key: HIVE-22402
> URL: https://issues.apache.org/jira/browse/HIVE-22402
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 4.0.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Major
> Attachments: HIVE-22402.1.patch
>
>
> Recently I wanted to add some additional capability, and add more,
> performance logging to support my troubleshooting efforts. I started looking
> at PerfLogger and started to examine its usage. I discovered a few things:
> # Since 'loggers' must be open and closed manually, I found a couple of
> places where loggers were opened, but not closed, rendering them useless
> # Since 'loggers' must be closed manually, I found a few places where an
> early-return or Exception thrown would cause a logger to not be closed,
> thereby rendering it useless
> # Session information is not logged, so it can be difficult to precisely
> pinpoint which session is taking lots of time
> # PerfLogger overloaded. Most of the time, it's being used as a simple timer
> mechanism with automatic logging in SLF4J debug. However, it is also a facade
> over the Hive Metrics subsystem and timing results are automatically
> published to Metrics and then there becomes this dependency on a 'logger' to
> be able to access metric data as well.
> The last bullet is the most challenging part and why I propose to deprecate
> the Hive {{PerfLogger}} and not simply remove it. I am proposing a new
> system... a {{PerfTimer}} that is allows for Java 8's try-with-resources
> feature to protect against the developer having to care about manually close
> measurements and not having to carefully consider all early-exits. The base
> implementation logs to SLF4J. An extended version automatically publishes to
> the Hive Metric subsystem as well.
> The Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable
> implementations. However, there are sections of the code that rely on the
> {{PerfLogger}}'s ability to publish to the Metrics subsystem. These code
> section look up various timers in the Metrics Subsytem and publish the
> results back to the client. Since, in theory, the implementation is plugable,
> any other implementation that does not publish to the Metrics Subsystem will
> break these code paths. Also, these code paths create and interact with
> {{PerfLogger}}s in a static way, and then the publishing code pulls the data
> from the {{PerfLogger}} (as a facade to the Metrics subsystem) in a static
> way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I
> came across an issue because there is not (and should not) be a way to just
> statically pull this information down from any point in the code. Information
> that is required for publishing should be passed around within some sort of
> context object, separate from the Metrics subsystem. There was no obvious way
> to string a new {{PerfTimer}} to all the required locations. I propose
> marking the {{PerfLogger}} as deprecated and leaving these complex section
> alone. Instead, replace only the simple "I want a timer" use cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)