[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

Hadoop QA (JIRA) Thu, 01 Nov 2012 11:57:15 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488930#comment-13488930
 ]


Hadoop QA commented on HBASE-6060:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12551735/trunk-6060_v3.3.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
85 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3210//console

This message is automatically generated.
                
> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-6060
>                 URL: https://issues.apache.org/jira/browse/HBASE-6060
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>            Reporter: Enis Soztutar
>            Assignee: Jimmy Xiang
>             Fix For: 0.96.0
>
>         Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
> 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
> 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
> 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
> 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
> HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
> HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
> trunk-6060.patch, trunk-6060_v2.patch, trunk-6060_v3.3.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

Reply via email to