[
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578045#comment-14578045
]
Hadoop QA commented on HBASE-13845:
-----------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12738416/HBASE-13845-branch-1.1.patch
against branch-1.1 branch at commit c19bc6d6e0dee4b60f45350ab7561fd15cfead7d.
ATTACHMENT ID: 12738416
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified tests.
{color:green}+1 hadoop versions{color}. The patch compiles with all
supported hadoop versions (2.4.1 2.5.2 2.6.0)
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 protoc{color}. The applied patch does not increase the
total number of protoc compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 checkstyle{color}. The applied patch does not increase the
total number of checkstyle errors
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/14337//testReport/
Release Findbugs (version 2.0.3) warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/14337//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/14337//artifact/patchprocess/checkstyle-aggregate.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/14337//console
This message is automatically generated.
> Expire of one region server carrying meta can bring down the master
> -------------------------------------------------------------------
>
> Key: HBASE-13845
> URL: https://issues.apache.org/jira/browse/HBASE-13845
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 2.0.0, 1.1.0, 1.2.0
> Reporter: Jerry He
> Assignee: Jerry He
> Fix For: 2.0.0, 1.2.0, 1.1.1
>
> Attachments: HBASE-13845-branch-1.1.patch
>
>
> There seems to be a code bug that can cause expiration of one region server
> carrying meta to bring down the master under certain case.
> Here is the sequence of event.
> a) The master detects the expiration of a region server on ZK, and starts to
> expire the region server.
> b) Since the failed region server carries meta, the shutdown handler will
> call verifyAndAssignMetaWithRetries() during processing the expired rs.
> c) In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
> {code}
> (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
> this.server.getZooKeeper(), timeout)) {
> this.services.getAssignmentManager().assignMeta
> (HRegionInfo.FIRST_META_REGIONINFO);
> } else if
> (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
> this.server.getZooKeeper()))) {
> throw new IOException("hbase:meta is onlined on the dead server "
> + serverName);
> {code}
> If we see the meta region is still alive on the expired rs, we throw an
> exception.
> We do some retries (default 10x1000ms) for verifyAndAssignMeta.
> If we still get the exception after retries, we abort the master.
> {code}
> 2015-05-27 06:58:30,156 FATAL
> [MASTER_META_SERVER_OPERATIONS-bdvs1163:60000-0] master.HMaster: Master
> server abort: loaded coprocessors are: []
> 2015-05-27 06:58:30,156 FATAL
> [MASTER_META_SERVER_OPERATIONS-bdvs1163:60000-0] master.HMaster:
> verifyAndAssignMeta failed after10 times retries, aborting
> java.io.IOException: hbase:meta is onlined on the dead server
> bdvs1164.svl.ibm.com,16020,1432681743203
> at
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
> at
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
> at
> org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2015-05-27 06:58:30,156 INFO
> [MASTER_META_SERVER_OPERATIONS-bdvs1163:60000-0] regionserver.HRegionServer:
> STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
> {code}
> The problem happens when the expired is slow processing its own expiration or
> has a slow death, and is still able to respond to master's meta verification
> in the meantime
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)