[
https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719403#action_12719403
]
Jonathan Gray commented on HBASE-1207:
--------------------------------------
We also grab a lock when we swap the memcache and snapshot. There are no
concurrency issues with open StoreScanners then, so we can drop synchronization
added by my patch in HBASE-1503. *However* I think there is a problem here.
We are not notifying readers when we swap the memcache and the snapshot. So
there is a period of time, after snapshot before flush, where we drop the write
lock (allowing readers in). We now have a case where the memcache is empty
(snapshot'd) but it has not made it to the storefile yet. In the case of gets,
we look at both the memcache and the snapshot, so this is not an issue.
Scanners, this is not the case. We will still be "peeked" potentially at a
value in the memcache that has been now moved to the snapshot. Other
situations we might iterate down and reach to top of the memcache but it won't
actually exist.
> Fix locking in memcache flush
> -----------------------------
>
> Key: HBASE-1207
> URL: https://issues.apache.org/jira/browse/HBASE-1207
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.19.0
> Reporter: Ben Maurer
> Assignee: Jonathan Gray
> Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I
> had a case where this process timed out and caused an exception to be thrown,
> which made the region server believe it had been unable to flush it's cache
> and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10
> tid=0x00000000562df400 nid=0x15d1 runnable
> [0x000000004108b000..0x000000004108bd90]
> #
> java.lang.Thread.State: RUNNABLE
> #
> at java.util.zip.CRC32.updateBytes(Native Method)
> #
> at java.util.zip.CRC32.update(CRC32.java:45)
> #
> at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
> at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
> at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
> - locked <0x00002aaaec1bd2d8> (a
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
> at
> org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
> - locked <0x00002aaaec1bd2d8> (a
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
> - locked <0x00002aaad1239000> (a
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
> - locked <0x00002aaad1239000> (a
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
> - locked <0x00002aaad1239000> (a
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
> at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
> - locked <0x00002aaad1238c38> (a
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
> - locked <0x00002aaad1238c38> (a
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
> - locked <0x00002aaad1238c38> (a
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
> at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
> - locked <0x00002aaad1238b80> (a
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
> at
> org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
> - locked <0x00002aaad1238b80> (a
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
> at
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
> at
> org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
> at
> org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
> - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
> at
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
> at
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.