[jira] Commented: (HBASE-1207) Fix locking in memcache flush

Jonathan Gray (JIRA) Sun, 14 Jun 2009 22:42:34 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719403#action_12719403
 ]


Jonathan Gray commented on HBASE-1207:
--------------------------------------

We also grab a lock when we swap the memcache and snapshot.  There are no 
concurrency issues with open StoreScanners then, so we can drop synchronization 
added by my patch in HBASE-1503.  *However* I think there is a problem here.

We are not notifying readers when we swap the memcache and the snapshot.  So 
there is a period of time, after snapshot before flush, where we drop the write 
lock (allowing readers in).  We now have a case where the memcache is empty 
(snapshot'd) but it has not made it to the storefile yet.  In the case of gets, 
we look at both the memcache and the snapshot, so this is not an issue.  
Scanners, this is not the case.  We will still be "peeked" potentially at a 
value in the memcache that has been now moved to the snapshot.  Other 
situations we might iterate down and reach to top of the memcache but it won't 
actually exist.

> Fix locking in memcache flush
> -----------------------------
>
>                 Key: HBASE-1207
>                 URL: https://issues.apache.org/jira/browse/HBASE-1207
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Ben Maurer
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I 
> had a case where this process timed out and caused an exception to be thrown, 
> which made the region server believe it had been unable to flush it's cache 
> and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10 
> tid=0x00000000562df400 nid=0x15d1 runnable 
> [0x000000004108b000..0x000000004108bd90]
> #
>    java.lang.Thread.State: RUNNABLE
> #
>         at java.util.zip.CRC32.updateBytes(Native Method)
> #
>         at java.util.zip.CRC32.update(CRC32.java:45)
> #
>         at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
>         at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
>         at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
>         - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1207) Fix locking in memcache flush

Reply via email to