[ https://issues.apache.org/jira/browse/HBASE-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reassigned HBASE-659: --------------------------- Assignee: stack > HLog#cacheFlushLock not cleared; hangs a region > ----------------------------------------------- > > Key: HBASE-659 > URL: https://issues.apache.org/jira/browse/HBASE-659 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.1.2 > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 0.1.3, 0.2.0 > > Attachments: 659-0.1.patch > > > I have a region that is stuck in a close that was ordained by a split. Here > is what I have from the log pertaining to the stuck region: > {code} > 4 6416 2008-05-29 22:29:03,433 INFO org.apache.hadoop.hbase.HRegion: > checking compaction completed on region > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 in 12sec > 5 6417 2008-05-29 22:29:03,439 INFO org.apache.hadoop.hbase.HRegion: > Splitting enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 because largest > aggregate size is 288.3m and desired size is 256.0m > > 6 6418 2008-05-29 22:29:03,443 DEBUG org.apache.hadoop.hbase.HRegion: > compactions and cache flushes disabled for region > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > 7 6419 2008-05-29 22:29:03,443 DEBUG org.apache.hadoop.hbase.HRegion: new > updates and scanners for region enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > disabled > > 8 6420 2008-05-29 22:29:03,443 DEBUG org.apache.hadoop.hbase.HRegion: no > more active scanners for region enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > 9 6421 2008-05-29 22:29:03,443 DEBUG org.apache.hadoop.hbase.HRegion: no > more row locks outstanding on region > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > > 10 6422 2008-05-29 22:29:03,443 DEBUG > org.apache.hadoop.hbase.HRegionServer: > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 closing (Adding to > retiringRegions) > 11 6423 2008-05-29 22:29:03,443 DEBUG org.apache.hadoop.hbase.HRegion: > Started memcache flush for region > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061. Current region memcache size > 2.1m > > 12 6424 2008-05-29 22:29:03,561 INFO org.apache.hadoop.ipc.Server: IPC > Server handler 0 on 60020, call > batchUpdate(enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061, 1171081390000, > [EMAIL PROTECTED]) from 208.76.44.139:49358: err or: org. > apache.hadoop.hbase.NotServingRegionException: > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > > > 13 6425 org.apache.hadoop.hbase.NotServingRegionException: > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > 14 6434 2008-05-29 22:29:03,982 INFO org.apache.hadoop.ipc.Server: IPC > Server handler 9 on 60020, call > batchUpdate(enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061, 1202595259000, > [EMAIL PROTECTED]) from 208.76.44.139:49358: err or: org. > apache.hadoop.hbase.NotServingRegionException: > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > 15 6435 org.apache.hadoop.hbase.NotServingRegionException: > enwiki,IK9sWdHJe6ffGZgFPsqIvk==,1212092907061 > {code} > Then in thread dump, I have two threads blocked on the HLog#cacheFlushLock > but looking in code, there is no obvious code path that would get a situation > where a lock is held and then not released. > {code} > "regionserver/0:0:0:0:0:0:0:0:60020.compactor" daemon prio=1 > tid=0x00002aab381e5fd0 nid=0x6195 waiting on condition > [0x0000000041c6c000..0x0000000041c6ce00] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(Unknown Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown > Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown > Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) > at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(Unknown > Source) > at java.util.concurrent.locks.ReentrantLock.lock(Unknown Source) > at org.apache.hadoop.hbase.HLog.startCacheFlush(HLog.java:459) > at > org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:1089) > at org.apache.hadoop.hbase.HRegion.close(HRegion.java:594) > - locked <0x00002aaab70bf3a0> (a java.lang.Integer) > at org.apache.hadoop.hbase.HRegion.splitRegion(HRegion.java:759) > - locked <0x00002aaab70bf3a0> (a java.lang.Integer) > at > org.apache.hadoop.hbase.HRegionServer$CompactSplitThread.split(HRegionServer.java:248) > at > org.apache.hadoop.hbase.HRegionServer$CompactSplitThread.run(HRegionServer.java:204) > ... > "regionserver/0:0:0:0:0:0:0:0:60020.logRoller" daemon prio=1 > tid=0x00002aab38181d70 nid=0x6193 waiting on condition > [0x0000000041a6a000..0x0000000041a6ab00] > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(Unknown Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown > Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown > Source) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown Source) > at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(Unknown > Source) > at java.util.concurrent.locks.ReentrantLock.lock(Unknown Source) > at org.apache.hadoop.hbase.HLog.rollWriter(HLog.java:219) > at > org.apache.hadoop.hbase.HRegionServer$LogRoller.run(HRegionServer.java:615) > - locked <0x00002aaab69ccf00> (a java.lang.Integer) > ... > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.