[
https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311057#comment-14311057
]
Jerry He commented on HBASE-12949:
----------------------------------
Right, The file/block is indeed bad from HBase perspective with the wrong kv
type and lengths. But the FileSystem gives it back to us without complain.
The wrong kv type confused the next and seek logic in scanner, and we were
stuck in an infinite loop.
The original problem was reported with other symptoms on the surface. HBase
clients often failed after a period of time.
The cause was that the infinite compaction held the region lock. There were
periodic bulk loads going to the same table. The bulk loads would block on the
region lock and there was no timeout on the lock.
Then the accumulated bulk loads occupied all the RPC handlers on the region
server. If the region server happens to hold the meta table, the HBase cluster
becomes entirely inaccessible.
> Scanner can be stuck in infinite loop if the HFile is corrupted
> ---------------------------------------------------------------
>
> Key: HBASE-12949
> URL: https://issues.apache.org/jira/browse/HBASE-12949
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.3, 0.98.10
> Reporter: Jerry He
> Attachments: HBASE-12949-master.patch
>
>
> We've encountered problem where compaction hangs and never completes.
> After looking into it further, we found that the compaction scanner was stuck
> in a infinite loop. See stack below.
> {noformat}
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296)
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672)
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529)
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
> {noformat}
> We identified the hfile that seems to be corrupted. Using HFile tool shows
> the following:
> {noformat}
> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k
> -m -f
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is
> deprecated. Instead, use io.native.lib.available
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using
> org.apache.hadoop.util.PureJavaCrc32
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use
> org.apache.hadoop.util.PureJavaCrc32C
> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is
> deprecated. Instead, use fs.defaultFS
> Scanning ->
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> WARNING, previous row is greater then current row
> filename ->
> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> previous ->
> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00
> current ->
> Exception in thread "main" java.nio.BufferUnderflowException
> at java.nio.Buffer.nextGetIndex(Buffer.java:489)
> at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539)
> at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802)
> {noformat}
> Turning on Java Assert shows the following:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Key
> 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0
> followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672)
> {noformat}
> It shows that the hfile seems to be corrupted -- the keys don't seem to be
> right.
> But Scanner is not able to give a meaningful error, but stuck in an infinite
> loop in here:
> {code}
> KeyValueHeap.generalizedSeek()
> while ((scanner = heap.poll()) != null) {
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)