[ 
https://issues.apache.org/jira/browse/HBASE-27509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-27509:
------------------------------
    Fix Version/s: 3.0.0-beta-2
                       (was: 3.0.0-beta-1)

> Possible region gets stuck in CLOSING state
> -------------------------------------------
>
>                 Key: HBASE-27509
>                 URL: https://issues.apache.org/jira/browse/HBASE-27509
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.3.4
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 3.0.0-beta-2
>
>
> There is a possible chance of region gets stuck in closing state could be 
> because of race between the flush and close or some where the readlock 
> acquired on the region is not getting released.
> {noformat}
> "MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x00005639266a4000 nid=0x296e 
> waiting on condition [0x00007fdc48a63000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> "MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x00005639266a2800 nid=0x296d 
> waiting on condition [0x00007fdc48b64000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
>     at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
> {noformat} 
> {noformat}
> "RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 
> os_prio=0 tid=0x00007fdc05448800 nid=0x15d1 waiting on condition 
> [0x00007fdc1befd000]
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00007fdf42dde850> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>     at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
>     at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
>     - locked <0x00007fdf42ddf358> (a java.lang.Object)
>     at 
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
>     at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> {noformat}
> From one of the region server logs flushed has started and replay edits of 
> flush added then close requested then it got stuck in the transition and no 
> further processing of the requests on the region.
> {noformat}
>   2022-11-23 05:51:00,503 INFO  [MemStoreFlusher.1] 
> regionserver.DefaultStoreFlusher: Flushed memstore data size=232.14 MB at 
> sequenceid=29163855 (bloomFilter=true), 
> to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/.tmp/cf1/c20f27a7407643b18558331d95f7a67f
>   2022-11-23 05:51:00,530 INFO  [MemStoreFlusher.1] regionserver.HStore: 
> Added 
> hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/cf1/c20f27a7407643b18558331d95f7a67f,
>  entries=231413, sequenceid=29163855, filesize=233.7 M
>   2022-11-23 05:51:00,536 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
> Finished flush of dataSize ~232.14 MB/243420836, heapSize ~256.00 
> MB/268439136, currentSize=719.06 KB/736320 for 
> 8f342bca97c115f8bce460a998e4afbc in 2146ms, sequenceid=29163855, compaction 
> requested=true
>   2022-11-23 05:51:00,541 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
> Flushing d92c7546be62225859dd641aa88992ea 1/1 column families, 
> dataSize=232.62 MB heapSize=256.53 MB
>   2022-11-23 05:51:00,645 INFO  [MemStoreFlusher.0] 
> regionserver.DefaultStoreFlusher: Flushed memstore data size=232.15 MB at 
> sequenceid=29037517 (bloomFilter=true), 
> to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/.tmp/cf1/eb542141078e4f879128498d01517e98
>   2022-11-23 05:51:00,705 INFO  [MemStoreFlusher.0] regionserver.HStore: 
> Added 
> hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/cf1/eb542141078e4f879128498d01517e98,
>  entries=231402, sequenceid=29037517, filesize=233.7 M
>   2022-11-23 05:51:00,706 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Finished flush of dataSize ~232.15 MB/243424206, heapSize ~256.01 
> MB/268442768, currentSize=935.82 KB/958277 for 
> 7beadd786ffc23edc074238c873f800b in 2324ms, sequenceid=29037517, compaction 
> requested=true
>   2022-11-23 05:51:00,708 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 6c6694dce02a13f8109ecc3dd70009d5 1/1 column families, 
> dataSize=232.68 MB heapSize=256.60 MB
>   2022-11-23 05:51:00,771 INFO  
> [RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0] 
> handler.UnassignRegionHandler: Close 8f342bca97c115f8bce460a998e4afbc
> {noformat}
> {noformat}
> 2022-11-23 07:37:06,887 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=8,port=16020] 
> regionserver.HRegion: Region is too busy to allow lock acquisition.
> org.apache.hadoop.hbase.RegionTooBusyException: Failed to obtain lock; 
> regionName=hbase_perf_wl2_1000m,user8104,1668039117539.8f342bca97c115f8bce460a998e4afbc.,
>  server=sl73tskrnsqln00107.visa.com,16020,1669131710569
>       at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8726)
>       at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8705)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8610)
>       at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3161)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2985)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45517)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to