Ankit Singhal created HBASE-21344:
-------------------------------------
Summary: hbase:meta location in ZooKeeper set to OPENING by the
procedure which eventually failed but precludes Master from assigning it forever
Key: HBASE-21344
URL: https://issues.apache.org/jira/browse/HBASE-21344
Project: HBase
Issue Type: Bug
Components: proc-v2
Affects Versions: 2.0.0
Reporter: Ankit Singhal
Assignee: Ankit Singhal
[~elserj] has already summarized it well.
1. hbase:meta was on RS8
2. RS8 crashed, SCP was queued for it, meta first
3. meta was marked OFFLINE
4. meta marked as OPENING on RS3
5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
6. We attempt the openRegion/assignment 10 times, failing each time
7. We start rolling back the procedure:
{code:java}
2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor:
Usually this should not happen, we will release the lock before if the
procedure is finished, even if the holdLock is true, arrive here means we have
some holes where we do not release the lock. And the releaseLock below may fail
since the procedure may have already been deleted from the procedure store.
2018-10-08 06:51:24,543 INFO [PEWorker-9] procedure.MasterProcedureScheduler:
pid=48, ppid=47, state=FAILED:REGION_TRANSITION_QUEUE,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 checking
lock on 1588230740
{code}
{code:java}
2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor:
CODE-BUG: Uncaught runtime exception for pid=47,
state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true,
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via
AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max
attempts exceeded; ServerCrashProcedure
server=<ip-address>,16020,1538974612843, splitWal=true, meta=true
java.lang.UnsupportedOperationException: unhandled
state=SERVER_CRASH_GET_REGIONS
at
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
at
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
{code}
{code:java}
{ DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7,
retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state
OPENING, details=row 'backup:system' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1,
exception=java.io.IOException: Meta region is in state OPENING
at
org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at
org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165)
at
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323)
at java.lang.Thread.run(Thread.java:748)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)