[
https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Purtell updated HBASE-12424:
-----------------------------------
Attachment: 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch
0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch
0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch
Here's a patch against master and three separate but related proposals broken
out.
Reviewboard: https://reviews.apache.org/r/27695/
Nothing is obviously broken:
{noformat}
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.44 sec - in
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
Results :
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0
{noformat}
Let's see what Jenkins thinks.
Change 1 adds a new regionserver histogram metric "splitTime" that captures the
total running time of split transactions.
Change 2 dumps the in memory journal of the split transaction, with timestamps
at each step, at INFO level at the end of the transaction whether or not it
succeeds or fails. The result looks like:
{noformat}
2014-11-06 10:57:00,598 INFO [RS:2;apurtell-ltm1:59524-splits-1415300220395]
regionserver.SplitRequest(140): Split transaction journal:
STARTED at 1415300220395
PREPARED at 1415300220400
BEFORE_PRE_SPLIT_HOOK at 1415300220401
AFTER_PRE_SPLIT_HOOK at 1415300220401
SET_SPLITTING at 1415300220404
CREATE_SPLIT_DIR at 1415300220406
CLOSED_PARENT_REGION at 1415300220412
OFFLINED_PARENT at 1415300220412
STARTED_REGION_A_CREATION at 1415300220470
STARTED_REGION_B_CREATION at 1415300220492
PONR at 1415300220513
OPENED_REGION_A at 1415300220585
OPENED_REGION_B at 1415300220585
BEFORE_POST_SPLIT_HOOK at 1415300220595
AFTER_POST_SPLIT_HOOK at 1415300220595
{noformat}
Change 3 adds a new regionserver histogram metric "flushTime" which is the
total time required to service a flush request.
> Finer grained logging and metrics for split transactions
> --------------------------------------------------------
>
> Key: HBASE-12424
> URL: https://issues.apache.org/jira/browse/HBASE-12424
> Project: HBase
> Issue Type: Improvement
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments:
> 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch,
> 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch,
> 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch,
> HowHBaseRegionSplitsareImplemented.pdf
>
>
> A split transaction is a complex orchestration of activity between the
> RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility
> into the time taken by various phases of the split transaction in the logs.
> We will see "Starting split of region $PARENT" before the transaction begins,
> before the parent is offlined. Later we will see "Opening $DAUGHTER" as one
> of the last steps in the transaction, this is after the parent has been
> flushed, offlined, and closed. Finally ""Region split, hbase:meta updated,
> and report to master ... Split took $TIME" after all steps are complete and
> including the total running time of the transaction.
> For debugging the cause(s) of long running split transactions it would be
> useful to know the distribution of time spent in all of the phases of the
> split transaction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)