[jira] [Created] (HBASE-12743) [ITBLL] Master fails rejoining cluster stuck splitting logs; Distributed log replay=true

stack (JIRA) Mon, 22 Dec 2014 09:48:34 -0800

stack created HBASE-12743:
-----------------------------

             Summary: [ITBLL] Master fails rejoining cluster stuck splitting 
logs; Distributed log replay=true
                 Key: HBASE-12743
                 URL: https://issues.apache.org/jira/browse/HBASE-12743
             Project: HBase
          Issue Type: Bug
            Reporter: stack



Master is stuck for two days trying to rejoin cluster after monkey killed and 
restarted it.

After retrying to get namespace 350 times, Master goes down:

{code}
2014-12-20 18:43:54,285 INFO  [c2020:16020.activeMasterManager] 
client.RpcRetryingCaller: Call exception, tries=349, retries=350, 
started=6885331 ms ago, cancelled=false, msg=row 'default' on table 
'hbase:namespace' at 
region=hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da., 
hostname=c2023.halxg.cloudera.com,16020,1418988286696, seqNum=6000000190
2014-12-20 18:43:54,303 WARN  [c2020:16020.activeMasterManager] 
master.TableNamespaceManager: Caught exception in initializing namespace table 
manager
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=350, exceptions:
Sat Dec 20 16:49:08 PST 2014, RpcRetryingCaller{globalStartTime=1419122948954, 
pause=100, retries=350}, org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: Region 
hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da. is not online 
on c2023.halxg.cloudera.com,16020,1418988286696
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2722)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:851)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1695)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30434)
{code}

Seems like 2014-12-20 16:49:03,665 INFO  [RS_LOG_REPLAY_OPS-c2021:16020-0] 
wal.WALSplitter: DistributedLogReplay = true

Seems easy enough to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-12743) [ITBLL] Master fails rejoining cluster stuck splitting logs; Distributed log replay=true

Reply via email to