[
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Singhal updated HBASE-21344:
----------------------------------
Affects Version/s: (was: 2.0.0)
> hbase:meta location in ZooKeeper set to OPENING by the procedure which
> eventually failed but precludes Master from assigning it forever
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor:
> Usually this should not happen, we will release the lock before if the
> procedure is finished, even if the holdLock is true, arrive here means we
> have some holes where we do not release the lock. And the releaseLock below
> may fail since the procedure may have already been deleted from the procedure
> store.
> 2018-10-08 06:51:24,543 INFO [PEWorker-9]
> procedure.MasterProcedureScheduler: pid=48, ppid=47,
> state=FAILED:REGION_TRANSITION_QUEUE,
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor:
> CODE-BUG: Uncaught runtime exception for pid=47,
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true,
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
> attempts exceeded; ServerCrashProcedure
> server=<ip-address>,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled
> state=SERVER_CRASH_GET_REGIONS
> at
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
> at
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7,
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state
> OPENING, details=row 'backup:system' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1,
> exception=java.io.IOException: Meta region is in state OPENING
> at
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165)
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)