[
https://issues.apache.org/jira/browse/HBASE-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-21926:
-----------------------------------
Description:
HIVE-20202 describes how Hive added a web endpoint for online in production
profiling based on async-profiler. The endpoint was added as a servlet to
httpserver and supports retrieval of flamegraphs compiled from the profiler
trace. Async profiler ([https://github.com/jvm-profiling-tools/async-profiler]
) can also profile heap allocations, lock contention, and HW performance
counters in addition to CPU.
The profiling overhead is pretty low and is safe to run in production. The
async-profiler project measured and describes CPU and memory overheads on these
issues: [https://github.com/jvm-profiling-tools/async-profiler/issues/14] and
[https://github.com/jvm-profiling-tools/async-profiler/issues/131]
We have an httpserver based servlet stack so we can use HIVE-20202 as an
implementation template for a similar feature for HBase daemons. Ideally we
achieve these requirements:
* Retrieve flamegraph SVG generated from latest profile trace.
* Online enable and disable of profiling activity. (async-profiler does not do
instrumentation based profiling so this should not cause the code gen related
perf problems of that other approach and can be safely toggled on and off while
under production load.)
* CPU profiling.
* ALLOCATION profiling.
was:
HIVE-20202 describes how Hive added a web endpoint for online in production
profiling based on async-profiler. The endpoint was added as a servlet to
httpserver and supports retrieval of flamegraphs compiled from the profiler
trace. Async profiler ([https://github.com/jvm-profiling-tools/async-profiler]
) can also profile heap allocations, lock contention, and HW performance
counters in addition to CPU.
The profiling overhead is pretty low and is safe to run in production. The
async-profiler project measured and describes CPU and memory overheads on these
issues: [https://github.com/jvm-profiling-tools/async-profiler/issues/14] and
[https://github.com/jvm-profiling-tools/async-profiler/issues/131]
We have an httpserver based servlet stack so we can use HIVE-20202 as an
implementation template for a similar feature for HBase daemons. Ideally we
achieve these requirements:
* Retrieve flamegraph SVG generated from latest profile trace.
* Online enable and disable of profiling activity. (async-profiler does not do
instrumentation based profiling so this should not cause the codgen related
perf problems of that other approach and can be safely toggled on and off while
under production load.)
* CPU profiling.
* ALLOCATION profiling.
> Profiler servlet
> ----------------
>
> Key: HBASE-21926
> URL: https://issues.apache.org/jira/browse/HBASE-21926
> Project: HBase
> Issue Type: New Feature
> Components: master, Operability, regionserver
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Major
> Fix For: 3.0.0, 1.6.0, 2.2.0
>
>
> HIVE-20202 describes how Hive added a web endpoint for online in production
> profiling based on async-profiler. The endpoint was added as a servlet to
> httpserver and supports retrieval of flamegraphs compiled from the profiler
> trace. Async profiler
> ([https://github.com/jvm-profiling-tools/async-profiler] ) can also profile
> heap allocations, lock contention, and HW performance counters in addition to
> CPU.
> The profiling overhead is pretty low and is safe to run in production. The
> async-profiler project measured and describes CPU and memory overheads on
> these issues:
> [https://github.com/jvm-profiling-tools/async-profiler/issues/14] and
> [https://github.com/jvm-profiling-tools/async-profiler/issues/131]
> We have an httpserver based servlet stack so we can use HIVE-20202 as an
> implementation template for a similar feature for HBase daemons. Ideally we
> achieve these requirements:
> * Retrieve flamegraph SVG generated from latest profile trace.
> * Online enable and disable of profiling activity. (async-profiler does not
> do instrumentation based profiling so this should not cause the code gen
> related perf problems of that other approach and can be safely toggled on and
> off while under production load.)
> * CPU profiling.
> * ALLOCATION profiling.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)