[ 
https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-12424:
-----------------------------------
    Attachment: 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch
                0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch
                0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch

Here's a patch against master and three separate but related proposals broken 
out. 

Reviewboard: https://reviews.apache.org/r/27695/

Nothing is obviously broken:
{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.44 sec - in 
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

Results :

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0
{noformat}

Let's see what Jenkins thinks.

Change 1 adds a new regionserver histogram metric "splitTime" that captures the 
total running time of split transactions.

Change 2 dumps the in memory journal of the split transaction, with timestamps 
at each step, at INFO level at the end of the transaction whether or not it 
succeeds or fails. The result looks like:
{noformat}
2014-11-06 10:57:00,598 INFO  [RS:2;apurtell-ltm1:59524-splits-1415300220395] 
regionserver.SplitRequest(140): Split transaction journal:
        STARTED at 1415300220395
        PREPARED at 1415300220400
        BEFORE_PRE_SPLIT_HOOK at 1415300220401
        AFTER_PRE_SPLIT_HOOK at 1415300220401
        SET_SPLITTING at 1415300220404
        CREATE_SPLIT_DIR at 1415300220406
        CLOSED_PARENT_REGION at 1415300220412
        OFFLINED_PARENT at 1415300220412
        STARTED_REGION_A_CREATION at 1415300220470
        STARTED_REGION_B_CREATION at 1415300220492
        PONR at 1415300220513
        OPENED_REGION_A at 1415300220585
        OPENED_REGION_B at 1415300220585
        BEFORE_POST_SPLIT_HOOK at 1415300220595
        AFTER_POST_SPLIT_HOOK at 1415300220595
{noformat}

Change 3 adds a new regionserver histogram metric "flushTime" which is the 
total time required to service a flush request. 

> Finer grained logging and metrics for split transactions
> --------------------------------------------------------
>
>                 Key: HBASE-12424
>                 URL: https://issues.apache.org/jira/browse/HBASE-12424
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>         Attachments: 
> 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 
> 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 
> 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 
> HowHBaseRegionSplitsareImplemented.pdf
>
>
> A split transaction is a complex orchestration of activity between the 
> RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility 
> into the time taken by various phases of the split transaction in the logs. 
> We will see "Starting split of region $PARENT" before the transaction begins, 
> before the parent is offlined. Later we will see "Opening $DAUGHTER" as one 
> of the last steps in the transaction, this is after the parent has been 
> flushed, offlined, and closed. Finally ""Region split, hbase:meta updated, 
> and report to master ... Split took $TIME" after all steps are complete and 
> including the total running time of the transaction. 
> For debugging the cause(s) of long running split transactions it would be 
> useful to know the distribution of time spent in all of the phases of the 
> split transaction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to