[
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019185#comment-13019185
]
[email protected] commented on HBASE-3759:
------------------------------------------------------
bq. On 2011-04-13 03:23:28, Michael Stack wrote:
bq. > +1 Looks good Gary. Agree do it now before 0.92. By chance did you see
if it made a difference profiling?
Yeah if you grab
https://issues.apache.org/jira/secure/attachment/12475852/cp_bypass.tar.gz you
can see the call trees I grabbed from profiling with the context object
(Call_Tree_context_xxx) vs. ThreadLocals (Call_Tree_tl_xxx). If you look at
Call_Tree_tl_run.html vs. Call_Tree_context_run.html, you'll see ~20% of the
runnable thread time spent in ThreadLocal.get() (under shouldComplete() and
shouldBypass()). This is completely eliminated in the context version, though
with small overhead for the object instatiation -- 0.4% in
CallContext.createAndPrepare(). (This was before a rename of CallContext ->
ObserverContext).
Granted this is only runnable thread time, so it's skewed in terms of overall
impact. But at the macro level, the MR put-based import that generated this
ran in ~2h15m with the ThreadLocal version, but only ~1h40m with the context
version. So seems a pretty substantial improvement.
- Gary
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/#review434
-----------------------------------------------------------
On 2011-04-13 01:08:50, Gary Helmling wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/588/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-04-13 01:08:50)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Profiling the HRegionServer process with a RegionObserver coprocessor
loaded shows a fair amount of runnable thread CPU time spent getting the bypass
and complete flag ThreadLocal values by RegionCoprocessorHost. See the
HBASE-3759 JIRA for some attached graphs.
bq.
bq. With the caveat that this is runnable CPU time and not threads in all
states, this still seems like a significant processing bottleneck on a hot call
path. The workload profiled was a put-based bulk load, so for each multi-put
request, RegionCoprocessorHost.prePut() could be called many times.
bq.
bq. Instead of using ThreadLocal variable for bypass/complete, which will
incur contention on the underlying map of values, I think we can eliminate the
bottleneck by using locally scoped variables for each preXXX/putXXX method
called in the RegionCoprocessorHost, MasterCoprocessorHost and
WALCoprocessorHost classes.
bq.
bq. The attached patch refactors the current RegionObserver, MasterObserver
and WALObserver APIs to provide a locally scoped ObserverContext object for
storing and checking the bypass and complete values.
bq.
bq. Summary of changes:
bq.
bq. * adds a new ObserverContext<T extends CoprocessorEnvironment> class,
containing references for bypass, complete and the environment instance
bq. * in each pre/post method in RegionObserver, the
RegionCoprocessorEnvironment parameter is replaced by
ObserverContext<RegionCoprocessorEnvironment>
bq. * in each pre/post method in MasterObserver, the
MasterCoprocessorEnvironment parameter is replaced by
ObserverContext<MasterCoprocessorEnvironment>
bq. * in each pre/post method in WALObserver, the WALCoprocessorEnvironment
parameter is replace by ObserverContext<WALCoprocesorEnvironment>
bq.
bq.
bq. This is obviously a large bulk change to the existing API. I could avoid
the API change with hacky modification underneath the *CoprocessorEnvironment
interfaces. But since we do not yet have a public release with coprocessors, I
would prefer to take the time to make the initial API the best we can before we
push it out.
bq.
bq. Please let me know your thoughts on this approach.
bq.
bq.
bq. This addresses bug HBASE-3759.
bq. https://issues.apache.org/jira/browse/HBASE-3759
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java
9576c48
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
5a0f095
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java
d45b950
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
a82f62b
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
db0870b
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
3501958
bq. src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java
7a34d18
bq. src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
019bbde
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java
60efa12
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java
a3f3b31
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java
834283f
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
0ce2147
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java
0db5001
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
a15d53a
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
2c1e4a0
bq.
bq. Diff: https://reviews.apache.org/r/588/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Gary
bq.
bq.
> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and
> complete()
> --------------------------------------------------------------------------------
>
> Key: HBASE-3759
> URL: https://issues.apache.org/jira/browse/HBASE-3759
> Project: HBase
> Issue Type: Improvement
> Components: coprocessors
> Reporter: Gary Helmling
> Assignee: Gary Helmling
> Attachments: cp_bypass.tar.gz
>
>
> In the current coprocessor framework, ThreadLocal objects are used for the
> bypass and complete booleans in CoprocessorEnvironment. This allows the
> *CoprocessorHost implementations to identify when to short-circuit processing
> the the preXXX and postXXX hook methods.
> Profiling the region server, however, shows that these ThreadLocals can
> become a contention point when on a hot code path (such as prePut()). We
> should refactor the CoprocessorHost pre/post implementations to remove usage
> of the ThreadLocal variables and replace them with locally scoped variables
> to eliminate contention between handler threads.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira