[jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers

Hadoop QA (JIRA) Thu, 30 Apr 2015 23:30:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522827#comment-14522827
 ]


Hadoop QA commented on HBASE-13605:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12729711/hbase-13605_v1.patch
  against master branch at commit 235dc9734f032945331c5a4a03ff2d69d170669d.
  ATTACHMENT ID: 12729711

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):       
at org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)
        at 
org.apache.oozie.test.XTestCase$MiniClusterShutdownMonitor.run(XTestCase.java:1071)
        at org.apache.oozie.test.XTestCase.waitFor(XTestCase.java:692)
        at 
org.apache.oozie.action.hadoop.TestPigActionExecutor._testSubmit(TestPigActionExecutor.java:180)
        at 
org.apache.oozie.action.hadoop.TestPigActionExecutor.testPigError(TestPigActionExecutor.java:369)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13907//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13907//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13907//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13907//console

This message is automatically generated.

> RegionStates should not keep its list of dead servers
> -----------------------------------------------------
>
>                 Key: HBASE-13605
>                 URL: https://issues.apache.org/jira/browse/HBASE-13605
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.0.2, 1.1.1
>
>         Attachments: hbase-13605_v1.patch
>
>
> As mentioned in 
> https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
>  and HBASE-12844 we should have only 1 source of cluster membership. 
> The list of dead server and RegionStates doing it's own liveliness check 
> (ServerManager.isServerReachable()) has caused an assignment problem again in 
> a test cluster where the region states "thinks" that the server is dead and 
> SSH will handle the region assignment. However the RS is not dead at all, 
> living happily, and never gets zk expiry or YouAreDeadException or anything. 
> This leaves the list of regions unassigned in OFFLINE state. 
> master assigning the region:
> {code}
> 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: 
> Onlined 77dddcd50c22e56bfff133c0e1f9165b on 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => 
> 77dddcd50c
> {code}
> Master then disabled the table, and unassigned the region:
> {code}
> 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] 
> zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING 
> to DISABLING
>  Starting unassign of 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), 
> current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, 
> ts=1429520545780,   
> server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
> bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
> 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: 
> Offlined 77dddcd50c22e56bfff133c0e1f9165b from 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}
> On table re-enable, AM does not assign the region: 
> {code}
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart 
> assignment.·
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 
> server(s), retainAssignment=true
> l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't 
> reach online server 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: 
> Updating the state to OFFLINE to allow to be reassigned by SSH
> nmentManager: Skip assigning 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead 
> but not processed yet server: 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers

Reply via email to