[ 
https://issues.apache.org/jira/browse/HBASE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719429#action_12719429
 ] 

ryan rawson commented on HBASE-1207:
------------------------------------

following your logic, it suggests that a scanner could get an empty memcache, 
and files 1-N, but we might scan some, then once 
notifyChangedReadersObservers() is called, we get file N+1.  If the data that 
was flushed is 'before' the key that is the 'top' key, we have missed some data 
that was written, but not flushed.

The hole is smalish, but not tiny.  Flushes take about 700ms.

I think maybe we should:
- add the snapshot to the scan
- notify changed observers when snapshot is created (and catching the snapshot 
if it wasnt there)
- notify changed observers when snapshot is finished flushing to disk (already 
doing)

This should close the hole.

But we should think of a way to verify this bug, and then be able to validate 
it later on.


> Fix locking in memcache flush
> -----------------------------
>
>                 Key: HBASE-1207
>                 URL: https://issues.apache.org/jira/browse/HBASE-1207
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Ben Maurer
>            Assignee: Jonathan Gray
>             Fix For: 0.20.0
>
>
> memcache flushing holds a write lock while it reopens StoreFileScanners. I 
> had a case where this process timed out and caused an exception to be thrown, 
> which made the region server believe it had been unable to flush it's cache 
> and shut itself down.
> Stack trace is:
> #
> "regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher" daemon prio=10 
> tid=0x00000000562df400 nid=0x15d1 runnable 
> [0x000000004108b000..0x000000004108bd90]
> #
>    java.lang.Thread.State: RUNNABLE
> #
>         at java.util.zip.CRC32.updateBytes(Native Method)
> #
>         at java.util.zip.CRC32.update(CRC32.java:45)
> #
>         at org.apache.hadoop.util.DataChecksum.update(DataChecksum.java:223)
> #
>         at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
> #
>         at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
> #
>         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1061)
> #
>         - locked <0x00002aaaec1bd2d8> (a 
> org.apache.hadoop.hdfs.DFSClient$BlockReader)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1616)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1666)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1593)
> #
>         - locked <0x00002aaad1239000> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> #
>         at java.io.DataInputStream.readInt(DataInputStream.java:371)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1943)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1844)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at 
> org.apache.hadoop.hbase.io.SequenceFile$Reader.next(SequenceFile.java:1890)
> #
>         - locked <0x00002aaad1238c38> (a 
> org.apache.hadoop.hbase.io.SequenceFile$Reader)
> #
>         at org.apache.hadoop.hbase.io.MapFile$Reader.next(MapFile.java:525)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.io.HalfMapFileReader.next(HalfMapFileReader.java:192)
> #
>         - locked <0x00002aaad1238b80> (a 
> org.apache.hadoop.hbase.io.HalfMapFileReader)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:110)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.updateReaders(StoreFileScanner.java:378)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:737)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.updateReaders(HStore.java:725)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:694)
> #
>         - locked <0x00002aaab7b41d30> (a java.lang.Integer)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:630)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:881)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> #
>         at 
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:137)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to