[
https://issues.apache.org/jira/browse/HBASE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026377#comment-15026377
]
Hudson commented on HBASE-14207:
--------------------------------
FAILURE: Integrated in HBase-1.1-JDK7 #1606 (See
[https://builds.apache.org/job/HBase-1.1-JDK7/1606/])
HBASE-14875 Forward port HBASE-14207 'Region was hijacked and remained (tedyu:
rev a75d2aac4f48b7c0b2d679416f081a05242d4378)
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
> Region was hijacked and remained in transition when RS failed to open a
> region and later regionplan changed to new RS on retry
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14207
> URL: https://issues.apache.org/jira/browse/HBASE-14207
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.98.6
> Reporter: Pankaj Kumar
> Assignee: Pankaj Kumar
> Priority: Critical
> Fix For: 0.98.15
>
> Attachments: HBASE-14207-0.98-V2.patch, HBASE-14207-0.98-V2.patch,
> HBASE-14207-0.98.patch
>
>
> On production environment, following events happened
> 1. Master is trying to assign a region to RS, but due to
> KeeperException$SessionExpiredException RS failed to open the region.
> In RS log, saw multiple WARN log related to
> KeeperException$SessionExpiredException
> > KeeperErrorCode = Session expired for
> /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
> > Unable to get data of znode
> /hbase/region-in-transition/08f1935d652e5dbdac09b423b8f9401b
> 2. Master retried to assign the region to same RS, but RS again failed.
> 3. On second retry new plan formed and this time plan destination (RS) is
> different, so master send the request to new RS to open the region. But new
> RS failed to open the region as there was server mismatch in ZNODE than the
> expected current server name.
> Logs Snippet:
> {noformat}
> HM
> 2015-07-14 03:50:29,759 | INFO | master:T101PC03VM13:21300 | Processing
> 08f1935d652e5dbdac09b423b8f9401b in state: M_ZK_REGION_OFFLINE |
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:644)
> 2015-07-14 03:50:29,759 | INFO | master:T101PC03VM13:21300 | Transitioned
> {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817029679,
> server=null} to {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN,
> ts=1436817029759, server=T101PC03VM13,21302,1436816690692} |
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:29,760 | INFO | master:T101PC03VM13:21300 | Processed
> region 08f1935d652e5dbdac09b423b8f9401b in state M_ZK_REGION_OFFLINE, on
> server: T101PC03VM13,21302,1436816690692 |
> org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:768)
> 2015-07-14 03:50:29,800 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> T101PC03VM13,21302,1436816690692 |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:29,801 | WARN |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=1
> of 10 |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:29,802 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Trying to re-assign
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> the same failed server. |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2123)
> 2015-07-14 03:50:31,804 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> T101PC03VM13,21302,1436816690692 |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:31,806 | WARN |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Failed assignment of
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> T101PC03VM13,21302,1436816690692, trying to assign elsewhere instead; try=2
> of 10 |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2077)
> 2015-07-14 03:50:31,807 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned
> {08f1935d652e5dbdac09b423b8f9401b state=PENDING_OPEN, ts=1436817031804,
> server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b
> state=OFFLINE, ts=1436817031807, server=T101PC03VM13,21302,1436816690692} |
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:50:31,807 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Assigning
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. to
> T101PC03VM14,21302,1436816997967 |
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1983)
> 2015-07-14 03:50:31,807 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-3 | Transitioned
> {08f1935d652e5dbdac09b423b8f9401b state=OFFLINE, ts=1436817031807,
> server=T101PC03VM13,21302,1436816690692} to {08f1935d652e5dbdac09b423b8f9401b
> state=PENDING_OPEN, ts=1436817031807,
> server=T101PC03VM14,21302,1436816997967} |
> org.apache.hadoop.hbase.master.RegionStates.updateRegionState(RegionStates.java:327)
> 2015-07-14 03:51:09,501 | INFO |
> MASTER_SERVER_OPERATIONS-T101PC03VM13:21300-4 | Skip assigning region in
> transition on other server{08f1935d652e5dbdac09b423b8f9401b
> state=PENDING_OPEN, ts=1436817031807,
> server=T101PC03VM14,21302,1436816997967} |
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:250)
> {noformat}
> {noformat}
> RS - T101PC03VM14
> 2015-07-14 03:50:31,809 | INFO |
> PriorityRpcServer.handler=2,queue=0,port=21302 | Open
> INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b. |
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:3671)
> 2015-07-14 03:50:31,830 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> regionserver:21302-0xe4e88f6f1b70002,
> quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002,
> baseZNode=/hbase Attempt to transition the unassigned node for
> 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE to
> RS_ZK_REGION_OPENING failed, the server that tried to transition was
> T101PC03VM14,21302,1436816997967 not the expected
> T101PC03VM13,21302,1436816690692 |
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875)
> 2015-07-14 03:50:31,830 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> Failed transition from OFFLINE to OPENING for
> region=08f1935d652e5dbdac09b423b8f9401b |
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:539)
> 2015-07-14 03:50:31,831 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> Region was hijacked? Opening cancelled for
> encodedName=08f1935d652e5dbdac09b423b8f9401b |
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:132)
> 2015-07-14 03:50:31,831 | INFO | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> Opening of region {ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME =>
> 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.',
> STARTKEY => '', ENDKEY => '200'} failed, transitioning from OFFLINE to
> FAILED_OPEN in ZK, expecting version -1 |
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:436)
> 2015-07-14 03:50:31,834 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> regionserver:21302-0xe4e88f6f1b70002,
> quorum=t101pc03vm12:24002,t101pc03vm13:24002,t101pc03vm14:24002,
> baseZNode=/hbase Attempt to transition the unassigned node for
> 08f1935d652e5dbdac09b423b8f9401b from M_ZK_REGION_OFFLINE to
> RS_ZK_REGION_FAILED_OPEN failed, the server that tried to transition was
> T101PC03VM14,21302,1436816997967 not the expected
> T101PC03VM13,21302,1436816690692 |
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:875)
> 2015-07-14 03:50:31,834 | WARN | RS_OPEN_REGION-T101PC03VM14:21302-2 |
> Unable to mark region {ENCODED => 08f1935d652e5dbdac09b423b8f9401b, NAME =>
> 'INTER_CONCURRENCY_SETTING,,1436596137981.08f1935d652e5dbdac09b423b8f9401b.',
> STARTKEY => '', ENDKEY => '200'} as FAILED_OPEN. It's likely that the master
> already timed out this open attempt, and thus another RS already has the
> region. |
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tryTransitionFromOfflineToFailedOpen(OpenRegionHandler.java:444)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)