Jerry What is the quick and easy way to monitor corrupted hfiles ? We are using HBASE 0.98
Thanks Asim > On Feb 5, 2015, at 10:47 AM, Jerry He (JIRA) <[email protected]> wrote: > > > [ > https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307748#comment-14307748 > ] > > Jerry He commented on HBASE-12949: > ---------------------------------- > > Hi, [~stack], [~ram_krish] > I agree. > It is a balancing act between checking, not checking, and checking more or > checking less. > We can check less, for example, only the type. > > Another option I can think of is that we have a property (say > 'SanityCheckCell'). > We will only do checking when reading the cells if the property is set to > true, > for people indicating strong cell sanity check, or for people lacking strong > FileSystem protection (checksum, etc). > > What do you think? > >> Scanner can be stuck in infinite loop if the HFile is corrupted >> --------------------------------------------------------------- >> >> Key: HBASE-12949 >> URL: https://issues.apache.org/jira/browse/HBASE-12949 >> Project: HBase >> Issue Type: Bug >> Affects Versions: 0.94.3, 0.98.10 >> Reporter: Jerry He >> Attachments: HBASE-12949-master.patch >> >> >> We've encountered problem where compaction hangs and never completes. >> After looking into it further, we found that the compaction scanner was >> stuck in a infinite loop. See stack below. >> {noformat} >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) >> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) >> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) >> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) >> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) >> {noformat} >> We identified the hfile that seems to be corrupted. Using HFile tool shows >> the following: >> {noformat} >> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k >> -m -f >> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 >> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is >> deprecated. Instead, use io.native.lib.available >> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using >> org.apache.hadoop.util.PureJavaCrc32 >> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use >> org.apache.hadoop.util.PureJavaCrc32C >> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is >> deprecated. Instead, use fs.defaultFS >> Scanning -> >> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 >> WARNING, previous row is greater then current row >> filename -> >> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 >> previous -> >> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 >> current -> >> Exception in thread "main" java.nio.BufferUnderflowException >> at java.nio.Buffer.nextGetIndex(Buffer.java:489) >> at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) >> at >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) >> at >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) >> at >> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) >> at >> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) >> at >> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at >> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) >> at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) >> {noformat} >> Turning on Java Assert shows the following: >> {noformat} >> Exception in thread "main" java.lang.AssertionError: Key >> 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 >> followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes >> at >> org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) >> {noformat} >> It shows that the hfile seems to be corrupted -- the keys don't seem to be >> right. >> But Scanner is not able to give a meaningful error, but stuck in an infinite >> loop in here: >> {code} >> KeyValueHeap.generalizedSeek() >> while ((scanner = heap.poll()) != null) { >> } >> {code} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332)
