[
https://issues.apache.org/jira/browse/HBASE-22761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506039#comment-17506039
]
Xiaolin Ha commented on HBASE-22761:
------------------------------------
Hi, [~zhangduo] , [~comnetwork] , thanks for your reply.
I think there might exist circumstances that new writer flushes the WAL entries
after the first writer released the entry buffer.
Here is a scenario,
# AsyncFSWAL#toWriteAppends=1,2,3,4,5,6,7,8,9,10...writer1 has 3 channels for
DNs, they are DN1, DN2, DN3;
# writer1 sync entries 1,2,3,4,5,6,7,8,9, whose total size is up to the
configured batch size. toWriteAppends=10, unackedAppends=1,2,3,4,5,6,7,8,9;
# DN1 sync all completed, [1,2,3,4,5,6,7,8,9] removed the unfinishedReplicas
of DN1, their unfinishedReplicas are [DN2, DN3];
# DN2 sync [1,2]completed, DN3 sync[1,2,3] completed, and [1,2]'s
unfinishedReplicas are [DN3], [3]'unfinishedReplicas are [DN2];
# DN3 sync[4] failed, calls FanoutOneblockAsyncDFSOutput#failed; at the
concurrency, DN2 sync[3] completed, [3]'s unfinishedReplicas is [];
# [4] triggered AsyncFSWAL#syncFailed, and [3] triggered
AsyncFSWAL#syncCompleted;
# after the AsyncFSWAL#syncFailed,
AsyncFSWAL#toWriteAppends=3,4,5,6,7,8,9,10..unackedAppends=3,4,5,6,7,8,9, new
writer2 is created;
# after the AsyncFSWAL#syncCompleted,
AsyncFSWAL#toWriteAppends=3,4,5,6,7,8,9,10..unackedAppends=4,5,6,7,8,9;
# writer2 flushes 3,4,5,6,7,8,9,10..., but 3 has already been released, so it
may write dirty data;
# when log splitter reads the dirty entry of writer2, it fails.
> Caught ArrayIndexOutOfBoundsException while processing event RS_LOG_REPLAY
> --------------------------------------------------------------------------
>
> Key: HBASE-22761
> URL: https://issues.apache.org/jira/browse/HBASE-22761
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.1
> Reporter: casuallc
> Priority: Major
> Attachments: tmp
>
>
> RegionServer exists when error happen
> {code:java}
> 2019-07-29 20:51:09,726 INFO [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> wal.WALSplitter: Processed 0 edits across 0 regions; edits skipped=0; log
> file=hdfs://cluster1/hbase/WALs/h2,16020,1564216856546-splitting/h2%2C16020%2C1564216856546.1564398538121,
> length=615233, corrupted=false, progress failed=false
> 2019-07-29 20:51:09,726 INFO [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> handler.WALSplitterHandler: Worker h1,16020,1564404572589 done with task
> org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@577da0d3
> in 84892ms. Status = null
> 2019-07-29 20:51:09,726 ERROR [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.ArrayIndexOutOfBoundsException: 16403
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100)
> at
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2019-07-29 20:51:09,730 ERROR [RS_LOG_REPLAY_OPS-regionserver/h1:16020-0]
> regionserver.HRegionServer: ***** ABORTING region server
> h1,16020,1564404572589: Caught throwable while processing event RS_LOG_REPLAY
> *****
> java.lang.ArrayIndexOutOfBoundsException: 16403
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:143)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:148)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:195)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:100)
> at
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)