Gary Helmling created HBASE-16097:
-------------------------------------

             Summary: Flushes and compactions fail on getting split point
                 Key: HBASE-16097
                 URL: https://issues.apache.org/jira/browse/HBASE-16097
             Project: HBase
          Issue Type: Bug
          Components: Compaction
    Affects Versions: 1.2.1
            Reporter: Gary Helmling
            Assignee: Gary Helmling


We've seen a number of cases where flushes and compactions run, completely 
through, then throw an IndexOutOfBoundsException when getting the split point 
when checking if a split is needed.

For flushes, the stack trace looks something like:
{noformat}
ERROR regionserver.MemStoreFlusher: Cache flusher failed for entry [flush 
region XXXXXXXX]
java.lang.IndexOutOfBoundsException: 131148
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
    at 
org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:491)
    at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:351)
    at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:520)
    at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1510)
    at 
org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:726)
    at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:127)
    at 
org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:2036)
    at 
org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7885)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
    at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
    at java.lang.Thread.run(Thread.java:745)
{noformat}

For compactions, the exception occurs in the same spot:
{noformat}
ERROR regionserver.CompactSplitThread: Compaction failed Request = 
regionName=XXXXX, storeName=X, fileCount=XX, fileSize=XXX M, priority=1, time=
java.lang.IndexOutOfBoundsException
    at java.nio.Buffer.checkIndex(Buffer.java:540)
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
    at 
org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:491)
    at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:351)
    at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:520)
    at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1510)
    at 
org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:726)
    at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:127)
    at 
org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:2036)
    at 
org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7885)
    at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:241)
    at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:540)
    at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{noformat}

This continues until a compaction runs through and rewrites whatever file is 
causing the problem, at which point a split can proceed successfully.

While compactions and flushes are successfully completing up until this point 
(it occurs after new store files have been moved into place), the exception 
thrown on flush causes us to exit prior to checking if a compaction is needed.  
So normal compactions wind up not being triggered and the effected regions 
accumulate a large number of store files.

No root cause yet, so I'm parking this info here for investigation.  Seems like 
we're either mis-writing part of the index or making some bad assumptions on 
the index blocks that we've read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to