[
https://issues.apache.org/jira/browse/HBASE-27509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-27509:
------------------------------
Fix Version/s: 3.0.0-beta-2
(was: 3.0.0-beta-1)
> Possible region gets stuck in CLOSING state
> -------------------------------------------
>
> Key: HBASE-27509
> URL: https://issues.apache.org/jira/browse/HBASE-27509
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.3.4
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Critical
> Fix For: 3.0.0-beta-2
>
>
> There is a possible chance of region gets stuck in closing state could be
> because of race between the flush and close or some where the readlock
> acquired on the region is not getting released.
> {noformat}
> "MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x00005639266a4000 nid=0x296e
> waiting on condition [0x00007fdc48a63000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007fdf42dde850> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> "MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x00005639266a2800 nid=0x296d
> waiting on condition [0x00007fdc48b64000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007fdf42dde850> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> {noformat}
> {noformat}
> "RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5
> os_prio=0 tid=0x00007fdc05448800 nid=0x15d1 waiting on condition
> [0x00007fdc1befd000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00007fdf42dde850> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
> - locked <0x00007fdf42ddf358> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> {noformat}
> From one of the region server logs flushed has started and replay edits of
> flush added then close requested then it got stuck in the transition and no
> further processing of the requests on the region.
> {noformat}
> 2022-11-23 05:51:00,503 INFO [MemStoreFlusher.1]
> regionserver.DefaultStoreFlusher: Flushed memstore data size=232.14 MB at
> sequenceid=29163855 (bloomFilter=true),
> to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/.tmp/cf1/c20f27a7407643b18558331d95f7a67f
> 2022-11-23 05:51:00,530 INFO [MemStoreFlusher.1] regionserver.HStore:
> Added
> hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/cf1/c20f27a7407643b18558331d95f7a67f,
> entries=231413, sequenceid=29163855, filesize=233.7 M
> 2022-11-23 05:51:00,536 INFO [MemStoreFlusher.1] regionserver.HRegion:
> Finished flush of dataSize ~232.14 MB/243420836, heapSize ~256.00
> MB/268439136, currentSize=719.06 KB/736320 for
> 8f342bca97c115f8bce460a998e4afbc in 2146ms, sequenceid=29163855, compaction
> requested=true
> 2022-11-23 05:51:00,541 INFO [MemStoreFlusher.1] regionserver.HRegion:
> Flushing d92c7546be62225859dd641aa88992ea 1/1 column families,
> dataSize=232.62 MB heapSize=256.53 MB
> 2022-11-23 05:51:00,645 INFO [MemStoreFlusher.0]
> regionserver.DefaultStoreFlusher: Flushed memstore data size=232.15 MB at
> sequenceid=29037517 (bloomFilter=true),
> to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/.tmp/cf1/eb542141078e4f879128498d01517e98
> 2022-11-23 05:51:00,705 INFO [MemStoreFlusher.0] regionserver.HStore:
> Added
> hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/cf1/eb542141078e4f879128498d01517e98,
> entries=231402, sequenceid=29037517, filesize=233.7 M
> 2022-11-23 05:51:00,706 INFO [MemStoreFlusher.0] regionserver.HRegion:
> Finished flush of dataSize ~232.15 MB/243424206, heapSize ~256.01
> MB/268442768, currentSize=935.82 KB/958277 for
> 7beadd786ffc23edc074238c873f800b in 2324ms, sequenceid=29037517, compaction
> requested=true
> 2022-11-23 05:51:00,708 INFO [MemStoreFlusher.0] regionserver.HRegion:
> Flushing 6c6694dce02a13f8109ecc3dd70009d5 1/1 column families,
> dataSize=232.68 MB heapSize=256.60 MB
> 2022-11-23 05:51:00,771 INFO
> [RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0]
> handler.UnassignRegionHandler: Close 8f342bca97c115f8bce460a998e4afbc
> {noformat}
> {noformat}
> 2022-11-23 07:37:06,887 WARN
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=8,port=16020]
> regionserver.HRegion: Region is too busy to allow lock acquisition.
> org.apache.hadoop.hbase.RegionTooBusyException: Failed to obtain lock;
> regionName=hbase_perf_wl2_1000m,user8104,1668039117539.8f342bca97c115f8bce460a998e4afbc.,
> server=sl73tskrnsqln00107.visa.com,16020,1669131710569
> at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8726)
> at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8705)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8610)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3161)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2985)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45517)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)