[ 
https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719308#action_12719308
 ] 

Jonathan Gray commented on HBASE-1207:
--------------------------------------

Having a hard time following the flow of memcache flushing.

Checks to see whether a memcache flush is necessary is done with a call to 
MemcacheFlusher.reclaimMemcacheMemory()

If we are above the acceptable watermark, then we call 
MemcacheFlusher.flushSomeRegions().  This is supposed to flush out (from 
largest to smallest) memcaches in the HRegion until we are below the low 
watermark.

Each region to be flushed has MemcacheFlusher.flushRegion(HRegion) called on 
it.  At the end of flushing regions, those which had a memcache flush also have 
a minor compaction triggered (by adding to the queue in the CompactSplitThread).

Once in flushRegion() i'm a bit lost.  Seems to be called from multiple 
different places and a bit hard to follow.

When the actual flush happens via HRegion.flushcache() , we ensure another 
flush is not running and writes are enabled (using WriteState for 
synchronization and volatile booleans).  We then grab a splitsAndCloses 
readLock in that method and call internalFlushcache().

Inside this method we grab a writeLock on the updatesLock in the region.  This 
throws the gate down and does not allow any further mutations on the region.  
But this write lock is only used to do the snapshotting, flushing is actually 
done outside of this lock.

Where is the reopening of StoreFileScanners being referred to in this issue?  
Is it what is addressed in HBASE-1503?  If so, then this is no longer an 
issue... If not, need some pointers from here.

> Fix locking in memcache flush
> -----------------------------
>
>                 Key: HBASE-1207
>                 URL: https://issues.apache.org/jira/browse/HBASE-1207
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Ben Maurer
>            Assignee: stack
>             Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I 
> had a case where this process timed out and caused an exception to be thrown, 
> which made the region server believe it had been unable to flush it's cache 
> and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10 
> tid=0x00000000562df400 nid=0x15d1 runnable 
> [0x000000004108b000..0x000000004108bd90]
> #
>    java.lang.Thread.State: RUNNABLE
> #
>         at java.util.zip.CRC32.updateBytes(Native Method)
> #
>         at java.util.zip.CRC32.update(CRC32.java:45)
> #
>         at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
>         at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
>         at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
>         - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to