Ankit Singhal created HBASE-21344:
-------------------------------------

             Summary: hbase:meta location in ZooKeeper set to OPENING by the 
procedure which eventually failed but precludes Master from assigning it forever
                 Key: HBASE-21344
                 URL: https://issues.apache.org/jira/browse/HBASE-21344
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
    Affects Versions: 2.0.0
            Reporter: Ankit Singhal
            Assignee: Ankit Singhal


[~elserj] has already summarized it well.

1. hbase:meta was on RS8
2. RS8 crashed, SCP was queued for it, meta first
3. meta was marked OFFLINE
4. meta marked as OPENING on RS3
5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
6. We attempt the openRegion/assignment 10 times, failing each time
7. We start rolling back the procedure:
{code:java}
2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
Usually this should not happen, we will release the lock before if the 
procedure is finished, even if the holdLock is true, arrive here means we have 
some holes where we do not release the lock. And the releaseLock below may fail 
since the procedure may have already been deleted from the procedure store.
2018-10-08 06:51:24,543 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: 
pid=48, ppid=47, state=FAILED:REGION_TRANSITION_QUEUE, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 checking 
lock on 1588230740
{code}
{code:java}
2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
CODE-BUG: Uncaught runtime exception for pid=47, 
state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
attempts exceeded; ServerCrashProcedure 
server=<ip-address>,16020,1538974612843, splitWal=true, meta=true
java.lang.UnsupportedOperationException: unhandled 
state=SERVER_CRASH_GET_REGIONS
        at 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
        at 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
{code}
{code:java}
{ DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
OPENING, details=row 'backup:system' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1, 
exception=java.io.IOException: Meta region is in state OPENING
        at 
org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at 
org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165)
        at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323)
        at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to