[
https://issues.apache.org/jira/browse/HBASE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Victor Xu resolved HBASE-9646.
------------------------------
Resolution: Duplicate
Release Note:
Same as HBASE-9645.
All caused by the HRegion's updatesLock.
> SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
> -----------------------------------------------------------------------------
>
> Key: HBASE-9646
> URL: https://issues.apache.org/jira/browse/HBASE-9646
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 0.94.10
> Environment: Linux 2.6.32-el5.x86_64
> Reporter: Victor Xu
>
> This case just happened after a regionserver
> halt.([HBASE-9645|https://issues.apache.org/jira/browse/HBASE-9645])
> As we know, when a RS was down, the master would direct other live RS to
> split the hlog remains of the dead RS. Then, after the whole splitting, it
> reassign the regions to other live RS.
> The case I described here is that one of the hlog file of the dead RS cannot
> be splitted. This hlog splitting task had been resubmitted by master several
> times to different RS, but all of them failed.
> I checked one of the RS's regionserver log, it had some read error:
> {noformat}
> 2013-09-19 15:54:53,082 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
> hadoop250.cm6,60020,1378886880990 acquired task
> /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs
> %2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
> 2013-09-19 15:54:53,083 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C6002
> 0%2C1378886880970.1379572773730, length=1169560315
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Recovering file
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C137888688
> 0970.1379572773730
> 2013-09-19 15:54:53,084 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> Finished lease recovery attempt for
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6
> %2C60020%2C1378886880970.1379572773730
> 2013-09-19 15:54:53,166 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old
> edits file. It could be the result of a previous failed split attempt.
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/page_content_queue/1af5b36fd3695e18ef85ea6d768a3e45/recovered.edits/0000000080485091467,
> length=71240
> 2013-09-19 15:54:53,177 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old
> edits file. It could be the result of a previous failed split attempt.
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/offline_dashboard_queue/2514b6f61f05a2616f1e3dd893a56da6/recovered.edits/0000000080485091468,
> length=1242
> 2013-09-19 15:54:53,188 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old
> edits file. It could be the result of a previous failed split attempt.
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/taobao_auction_rowkey_queue/45ba2b931f94d4d453977ea16bccda6b/recovered.edits/0000000080485091472,
> length=704
> 2013-09-19 15:54:53,204 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old
> edits file. It could be the result of a previous failed split attempt.
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/shangpin_alt_taoke_doc_queue/aabc3dc0ab1c9c1ead1239b6be4234ed/recovered.edits/0000000080485091471,
> length=8998
> 2013-09-19 15:54:53,216 WARN
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old
> edits file. It could be the result of a previous failed split attempt.
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/link_rowkey/eb004b12a0e1d3f467cadb19a53ca29e/recovered.edits/0000000080485091462,
> length=612
> 2013-09-19 15:54:53,219 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed 13 edits
> across 5 regions threw away edits for 0 regions; log
> file=hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
> is corrupted = false progress failed = false
> 2013-09-19 15:54:53,219 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
> hadoop250.cm6,60020,1378886880990 done with task
> /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
> in 137ms
> 2013-09-19 15:54:53,219 ERROR
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
> at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
> at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
> at
> org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:813)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:437)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
> at java.lang.Thread.run(Thread.java:662)
> 2013-09-19 15:54:53,219 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker
> hadoop250.cm6,60020,1378886880990 exiting
> {noformat}
> I then used the HLog dump tool to analyze the hlog file. Here is what I got:
> {noformat}
> $ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump
> /user/hadoop/hbase_admin/logs/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
> 13/09/24 14:24:38 WARN conf.Configuration: fs.default.name is deprecated.
> Instead, use fs.defaultFS
> Sequence 80485091459 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table
> shangpin_alt_taoke_doc_queue
> Action:
> row: 00015:0000001379572773802:00001
> column: message:789
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091460 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table
> shangpin_alt_taoke_doc_queue
> Action:
> row: 00015:0000001379572773799:00001
> column: message:789
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091461 from region eb004b12a0e1d3f467cadb19a53ca29e in table
> link_rowkey
> Action:
> row: 191:1379572773794
> column: link_crawler:15acca08d58e374208eb603aaa4c58e8_1379572773791_345445
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091462 from region eb004b12a0e1d3f467cadb19a53ca29e in table
> link_rowkey
> Action:
> row: 191:1379572773793
> column: link_crawler:f8a52947d7a0e794204f118fc7c5fb5a_1379572773791_916104
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091463 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table
> page_content_queue
> Action:
> row: 00039:0000001379572773776:00001
> column: message:content
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091464 from region 2514b6f61f05a2616f1e3dd893a56da6 in table
> offline_dashboard_queue
> Action:
> row: 00082:0000001379572773772:00001
> column: message:parser
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091465 from region 2514b6f61f05a2616f1e3dd893a56da6 in table
> offline_dashboard_queue
> Action:
> row: 00082:0000001379572773770:00001
> column: message:parser
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091466 from region 45ba2b931f94d4d453977ea16bccda6b in table
> taobao_auction_rowkey_queue
> Action:
> row: 00095:0000001379572773768:00001
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Action:
> row: 00095:0000001379572773768:00002
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Action:
> row: 00095:0000001379572773768:00003
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091467 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table
> page_content_queue
> Action:
> row: 00039:0000001379572773766:00001
> column: message:content
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091468 from region 2514b6f61f05a2616f1e3dd893a56da6 in table
> offline_dashboard_queue
> Action:
> row: 00082:0000001379572773759:00001
> column: message:parser
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091469 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table
> shangpin_alt_taoke_doc_queue
> Action:
> row: 00015:0000001379572773756:00001
> column: message:789
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091471 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table
> shangpin_alt_taoke_doc_queue
> Action:
> row: 00015:0000001379572773755:00001
> column: message:789
> at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091472 from region 45ba2b931f94d4d453977ea16bccda6b in table
> taobao_auction_rowkey_queue
> Action:
> row: 00095:0000001379572773754:00001
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Action:
> row: 00095:0000001379572773754:00002
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Action:
> row: 00095:0000001379572773754:00003
> column: message:taobao_auction
> at time: Thu Sep 19 14:39:33 CST 2013
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index (5)
> must be less than size (5)
> at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
> at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
> at
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
> at
> org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:148)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.processFile(HLogPrettyPrinter.java:242)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.run(HLogPrettyPrinter.java:371)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1935)
> {noformat}
> The exception was the same, but in the hlog detail, we could see that the
> {color:red}sequence id '*80485091470*'{color} was missing. Actually, that
> could be the missing index '*5*' in the exception.
> So, I think there might be some bugs in HLog writing logic that made a 'gap'
> in the sequence writing phase.
--
This message was sent by Atlassian JIRA
(v6.1#6144)