[
https://issues.apache.org/jira/browse/HBASE-22761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506052#comment-17506052
]
Duo Zhang commented on HBASE-22761:
-----------------------------------
OK, I think this one is possible.
The key problem here is that, a fast DN fails first, and then a slow DN acks an
earlier packet. So we will first get a syncFailed at upper layer, and then a
syncCompleted.
Actually the syncFailed has a greater sequence id but we do not pass it to the
upper layer, and it is not practical to wait for the later syncCompleted, we
should try to recover ASAP.
Let me think about what is the suitable way to fix it. Maybe we could just
check whether the writer instance is still the same in syncCompleted? For
normal case, it is impossible that we still want to complete a request for the
previous writer? We need to make sure all the outcoming requests are finished
before rolling?
Thanks.
> Caught ArrayIndexOutOfBoundsException while processing event RS_LOG_REPLAY
> --------------------------------------------------------------------------
>
> Key: HBASE-22761
> URL: https://issues.apache.org/jira/browse/HBASE-22761
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.1
> Reporter: casuallc
> Priority: Major
> Attachments: tmp
>
>
> RegionServer exists when error happen
> {code:java}
> 2019-07-29 20:51:09,726 INFO [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> wal.WALSplitter: Processed 0 edits across 0 regions; edits skipped=0; log
> file=hdfs://cluster1/hbase/WALs/h2,16020,1564216856546-splitting/h2%2C16020%2C1564216856546.1564398538121,
> length=615233, corrupted=false, progress failed=false
> 2019-07-29 20:51:09,726 INFO [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> handler.WALSplitterHandler: Worker h1,16020,1564404572589 done with task
> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@577da0d3
> in 84892ms. Status = null
> 2019-07-29 20:51:09,726 ERROR [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.ArrayIndexOutOfBoundsException: 16403
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100)
> at
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2019-07-29 20:51:09,730 ERROR [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> regionserver.HRegionServer: ***** ABORTING region server
> h1,16020,1564404572589: Caught throwable while processing event RS_LOG_REPLAY
> *****
> java.lang.ArrayIndexOutOfBoundsException: 16403
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100)
> at
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)