stack created HBASE-20992:
-----------------------------

             Summary: MTTR, Chaos, and ITBLL
                 Key: HBASE-20992
                 URL: https://issues.apache.org/jira/browse/HBASE-20992
             Project: HBase
          Issue Type: Sub-task
          Components: integration tests, MTTR
            Reporter: stack


I've been having trouble getting a sustained, large ITBLL run to complete over 
the last few days. I'm seeing a bunch of the below:

 * A region splits or is moved
 * Chaos kills the Master in the middle of the Split or Move Procedure after a 
Region has been offlined
 * Master takes a while to come back whether because it is not started until a 
couple of minutes have passed and then there is some recovery to be done.

So a region can be offline for minutes. Default we retry up to 16 times which 
ends up at about 2.5 minutes before we give up.

So, I can up the retries when running larger tests but also, the region should 
come back online faster. 

Let me hang ITBLL fixes/notes off here.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to