[ 
https://issues.apache.org/jira/browse/HBASE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu resolved HBASE-9646.
------------------------------

      Resolution: Duplicate
    Release Note: 
Same as HBASE-9645.
All caused by the HRegion's updatesLock. 

> SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9646
>                 URL: https://issues.apache.org/jira/browse/HBASE-9646
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.10
>         Environment: Linux 2.6.32-el5.x86_64
>            Reporter: Victor Xu
>
> This case just happened after a regionserver 
> halt.([HBASE-9645|https://issues.apache.org/jira/browse/HBASE-9645])
> As we know, when a RS was down, the master would direct other live RS to 
> split the hlog remains of the dead RS. Then, after the whole splitting, it 
> reassign the regions to other live RS. 
> The case I described here is that one of the hlog file of the dead RS cannot 
> be splitted. This hlog splitting task had been resubmitted by master several 
> times to different RS, but all of them failed.
> I checked one of the RS's regionserver log, it had some read error:
> {noformat}
> 2013-09-19 15:54:53,082 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
> hadoop250.cm6,60020,1378886880990 acquired task 
> /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs
> %2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
> 2013-09-19 15:54:53,083 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: 
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C6002
> 0%2C1378886880970.1379572773730, length=1169560315
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Recovering file 
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C137888688
> 0970.1379572773730
> 2013-09-19 15:54:53,084 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: 
> Finished lease recovery attempt for 
> hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6
> %2C60020%2C1378886880970.1379572773730
> 2013-09-19 15:54:53,166 WARN 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
> edits file. It could be the result of a previous failed split attempt. 
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/page_content_queue/1af5b36fd3695e18ef85ea6d768a3e45/recovered.edits/0000000080485091467,
>  length=71240
> 2013-09-19 15:54:53,177 WARN 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
> edits file. It could be the result of a previous failed split attempt. 
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/offline_dashboard_queue/2514b6f61f05a2616f1e3dd893a56da6/recovered.edits/0000000080485091468,
>  length=1242
> 2013-09-19 15:54:53,188 WARN 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
> edits file. It could be the result of a previous failed split attempt. 
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/taobao_auction_rowkey_queue/45ba2b931f94d4d453977ea16bccda6b/recovered.edits/0000000080485091472,
>  length=704
> 2013-09-19 15:54:53,204 WARN 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
> edits file. It could be the result of a previous failed split attempt. 
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/shangpin_alt_taoke_doc_queue/aabc3dc0ab1c9c1ead1239b6be4234ed/recovered.edits/0000000080485091471,
>  length=8998
> 2013-09-19 15:54:53,216 WARN 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old 
> edits file. It could be the result of a previous failed split attempt. 
> Deleting hdfs://hadoopnnvip.cm
> 6:9000/hbase/link_rowkey/eb004b12a0e1d3f467cadb19a53ca29e/recovered.edits/0000000080485091462,
>  length=612
> 2013-09-19 15:54:53,219 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed 13 edits 
> across 5 regions threw away edits for 0 regions; log 
> file=hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
>  is corrupted = false progress failed = false
> 2013-09-19 15:54:53,219 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
> hadoop250.cm6,60020,1378886880990 done with task 
> /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
>  in 137ms
> 2013-09-19 15:54:53,219 ERROR 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error 
> java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
>         at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:813)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:437)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
>         at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-09-19 15:54:53,219 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker 
> hadoop250.cm6,60020,1378886880990 exiting
> {noformat}
> I then used the HLog dump tool to analyze the hlog file. Here is what I got:
> {noformat}
> $ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump 
> /user/hadoop/hbase_admin/logs/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
> 13/09/24 14:24:38 WARN conf.Configuration: fs.default.name is deprecated. 
> Instead, use fs.defaultFS
> Sequence 80485091459 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table 
> shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773802:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091460 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table 
> shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773799:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091461 from region eb004b12a0e1d3f467cadb19a53ca29e in table 
> link_rowkey
>   Action:
>     row: 191:1379572773794
>     column: link_crawler:15acca08d58e374208eb603aaa4c58e8_1379572773791_345445
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091462 from region eb004b12a0e1d3f467cadb19a53ca29e in table 
> link_rowkey
>   Action:
>     row: 191:1379572773793
>     column: link_crawler:f8a52947d7a0e794204f118fc7c5fb5a_1379572773791_916104
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091463 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table 
> page_content_queue
>   Action:
>     row: 00039:0000001379572773776:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091464 from region 2514b6f61f05a2616f1e3dd893a56da6 in table 
> offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773772:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091465 from region 2514b6f61f05a2616f1e3dd893a56da6 in table 
> offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773770:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091466 from region 45ba2b931f94d4d453977ea16bccda6b in table 
> taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773768:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091467 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table 
> page_content_queue
>   Action:
>     row: 00039:0000001379572773766:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091468 from region 2514b6f61f05a2616f1e3dd893a56da6 in table 
> offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773759:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091469 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table 
> shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773756:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091471 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table 
> shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773755:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091472 from region 45ba2b931f94d4d453977ea16bccda6b in table 
> taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773754:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index (5) 
> must be less than size (5)
>       at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
>       at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:148)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.processFile(HLogPrettyPrinter.java:242)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.run(HLogPrettyPrinter.java:371)
>       at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1935)
> {noformat}
> The exception was the same, but in the hlog detail, we could see that the 
> {color:red}sequence id '*80485091470*'{color} was missing. Actually, that 
> could be the missing index '*5*' in the exception.
> So, I think there might be some bugs in HLog writing logic that made a 'gap' 
> in the sequence writing phase. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to