[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline

Hadoop QA (JIRA) Fri, 03 Jan 2014 16:53:12 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862095#comment-13862095
 ]


Hadoop QA commented on HBASE-10272:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621405/HBASE-10272.patch
  against trunk revision .
  ATTACHMENT ID: 12621405

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

    {color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 4 release 
audit warnings (more than the trunk's current 0 warnings).

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

    {color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8336//console

This message is automatically generated.

> Cluster becomes in-operational if the node hosting the active Master AND 
> ROOT/META table goes offline
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10272
>                 URL: https://issues.apache.org/jira/browse/HBASE-10272
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.96.1, 0.94.15
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>            Priority: Critical
>         Attachments: HBASE-10272.patch, HBASE-10272_0.94.patch
>
>
> Since HBASE-6364, HBase client caches a connection failure to a server and 
> any subsequent attempt to connect to the server throws a 
> {{FailedServerException}}
> Now if a node which hosted the active Master AND ROOT/META table goes 
> offline, the newly anointed Master's initial attempt to connect to the dead 
> region server will fail with {{NoRouteToHostException}} which it handles but 
> since on second attempt crashes with {{FailedServerException}}
> Here is the log from one such occurance
> {noformat}
> 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
> server abort: loaded coprocessors are: []
> 2013-11-20 10:58:00,161 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is 
> in the failed servers list: xxx02/192.168.1.102:60020
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
>         at $Proxy9.getProtocolVersion(Unknown Source)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1294)
>         at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1281)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:383)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:445)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnection(CatalogTracker.java:464)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:624)
>         at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:684)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:560)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:376)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-11-20 10:58:00,162 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2013-11-20 10:58:00,162 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
> server on 60000
> {noformat}
> Each of the backup master will crash with same error and restarting them will 
> have the same effect. Once this happens, the cluster will remain 
> in-operational until the node with region server is brought online (or the 
> Zookeeper node containing the root region server and/or META entry from the 
> ROOT table is deleted).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10272) Cluster becomes in-operational if the node hosting the active Master AND ROOT/META table goes offline

Reply via email to