[
https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerry He updated HBASE-12949:
-----------------------------
Attachment: HBASE-12949-master.patch
Attached a patch to see if you folks are ok with the approach.
Here is what would show up after the patch with the bad hfile.
In the region server log, which a aborted compaction:
{noformat}
2015-02-04 13:57:39,077 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed
Request =
regionName=CUMMINS_INSITE_V1,20110207-105743558-21939316-1406327439200524000,1422756983729.bc8e4b3996d2424f21dc0cfdcd422a6b.,
storeName=attributes, fileCount=1, fileSize=6.4 G (6.4 G), priority=9,
time=1423087053717124000
java.io.IOException: Could not iterate
StoreFileScanner[org.apache.hadoop.hbase.io.HalfStoreFileReader$1@714e714e,
cur=20110208-080219433-21950204-1397112048924811000/attributes:1015319_1010319/1397120694918/Put/vlen=15/mvcc=0]
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:142)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:77)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:110)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1099)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1483)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:506)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929)
at java.lang.Thread.run(Thread.java:738)
Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Invalid type 0
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.getKeyValue(HFileReaderV2.java:695)
at
org.apache.hadoop.hbase.io.HalfStoreFileReader$1.getKeyValue(HalfStoreFileReader.java:149)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:137)
... 11 more
{noformat}
Doing a get from the shell:
{noformat}
hbase(main):002:0> get 'CUMMINS_INSITE_V1',
'20110208-080219433-21950204-1397112048924811000'
COLUMN CELL
ERROR: java.io.IOException: Could not iterate
StoreFileScanner[org.apache.hadoop.hbase.io.HalfStoreFileReader$1@70837083,
cur=20110208-080219433-21950204-1397112048924811000/attributes:1015319_1010319/1397120694918/Put/vlen=15/mvcc=0]
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:142)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3992)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4072)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3950)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3919)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3906)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4882)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4856)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2951)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29937)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:110)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:90)
at java.lang.Thread.run(Thread.java:738)
Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Invalid type 0
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.getKeyValue(HFileReaderV2.java:695)
at
org.apache.hadoop.hbase.io.HalfStoreFileReader$1.getKeyValue(HalfStoreFileReader.java:149)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:137)
... 17 more
{noformat}
> Scanner can be stuck in infinite loop if the HFile is corrupted
> ---------------------------------------------------------------
>
> Key: HBASE-12949
> URL: https://issues.apache.org/jira/browse/HBASE-12949
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.3, 0.98.10
> Reporter: Jerry He
> Attachments: HBASE-12949-master.patch
>
>
> We've encountered problem where compaction hangs and never completes.
> After looking into it further, we found that the compaction scanner was stuck
> in a infinite loop. See stack below.
> {noformat}
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296)
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672)
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529)
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
> {noformat}
> We identified the hfile that seems to be corrupted. Using HFile tool shows
> the following:
> {noformat}
> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k
> -m -f
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using
> org.apache.hadoop.util.PureJavaCrc32
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use
> org.apache.hadoop.util.PureJavaCrc32C
> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is
> deprecated. Instead, use fs.defaultFS
> Scanning ->
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> WARNING, previous row is greater then current row
> filename ->
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> previous ->
> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00
> current ->
> Exception in thread "main" java.nio.BufferUnderflowException
> at java.nio.Buffer.nextGetIndex(Buffer.java:489)
> at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539)
> at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802)
> {noformat}
> Turning on Java Assert shows the following:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Key
> 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0
> followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672)
> {noformat}
> It shows that the hfile seems to be corrupted -- the keys don't seem to be
> right.
> But Scanner is not able to give a meaningful error, but stuck in an infinite
> loop in here:
> {code}
> KeyValueHeap.generalizedSeek()
> while ((scanner = heap.poll()) != null) {
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)