[
https://issues.apache.org/jira/browse/HBASE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704080#comment-13704080
]
Himanshu Vashishtha commented on HBASE-8911:
--------------------------------------------
Using a patched version, I kill a meta regionserver (that also had one non-meta
region):
The master provides dump:
{code}
1)
{"Description":"SplitLogManager","Start":1373415141061,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-4469867469428889343,"Stop":1373415145255,"SpanID":-948196168508937537}
2)
{"Description":"MetaServerShutdownHandler","Start":1373415141027,"Annotations":{},"TraceID":7794030922126752097,"ParentID":477902,"Stop":1373415145379,"SpanID":-4469867469428889343}
3)
{"Description":"SplitLogManager","Start":1373415146016,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415150138,"SpanID":-8986584144293319916}
4) {"Description":"ServerShutdownHandler:
AssignmentManager","Start":1373415150138,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415150181,"SpanID":5233034729044488427}
5)
{"Description":"ServerShutdownHandler","Start":1373415145380,"Annotations":{},"TraceID":3471834208649164937,"ParentID":477902,"Stop":1373415150181,"SpanID":7097670840195911759}
{code}
h3. Explanation:
At first, meta region is handled. Line 1) is about Log splitting of the meta
logs. Line 2) is the processing time of MetaSSH (see its start/stop time covers
the log splitting span at line 1).
Lines 3, 4 and 5 are about processing non-meta logs and assigning regions on
the dead regionserver. Line 3 is about splitting, line 4 is about region
assignment, and line 5 is the parent of 3 and 4.
On the regionserver where the new meta lands, I get the following trace:
{code}
1) {"Description":"handling callId: 28 service: AdminService methodName:
openRegion size: 67.0 connection:
10.20.188.114:49125","Start":1373415145297,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-4469867469428889343,"Stop":1373415145380,"SpanID":-2912335450699571517}
2) {"Description":"handling callId: 29 service: ClientService methodName: scan
size: 71.0 connection:
10.20.188.114:49126","Start":1373415145865,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415145866,"SpanID":-1834828785626104167}
3)
{"Description":"RS_OPEN_META-a1215:40020-0","Start":1373415145376,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-2912335450699571517,"Stop":1373415145872,"SpanID":522405091715978575}
4) {"Description":"handling callId: 30 service: ClientService methodName: scan
size: 71.0 connection:
10.20.188.114:49126","Start":1373415145977,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415145999,"SpanID":-3636631271530731407}
5) {"Description":"handling callId: 31 service: ClientService methodName: scan
size: 50.0 connection:
10.20.188.114:49126","Start":1373415146000,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415146005,"SpanID":2496204017007798885}
6) {"Description":"handling callId: 32 service: ClientService methodName: scan
size: 48.0 connection:
10.20.188.114:49126","Start":1373415146009,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415146010,"SpanID":5149809990225735159}
7) {"Description":"handling callId: 33 service: AdminService methodName:
openRegion size: 66.0 connection:
10.20.188.114:49125","Start":1373415150162,"Annotations":{},"TraceID":3471834208649164937,"ParentID":5233034729044488427,"Stop":1373415150183,"SpanID":-642245433468525490}
{code}
h3. Explanation:
Lines 1 and 3 are about opening the meta region, while other lines are about
handling other regions (scaning the meta and assigning the non-meta region).
Most importantly, we could figure out the total time taken by the regionserver
failover by looking at the HMaster trace file.
> Inject MTTR specific traces to get a break up of various steps
> --------------------------------------------------------------
>
> Key: HBASE-8911
> URL: https://issues.apache.org/jira/browse/HBASE-8911
> Project: HBase
> Issue Type: Bug
> Components: MTTR
> Affects Versions: 0.95.1
> Reporter: Himanshu Vashishtha
> Attachments: 8911-v0.patch
>
>
> There are various steps involved in a regionserver recovery process. This
> jira adds instrumentation at various places in order to get an idea what are
> the steps involved in a regionserver recovery and how much time is spent in
> each of these parts.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira