[ 
https://issues.apache.org/jira/browse/HBASE-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760337#comment-16760337
 ] 

Bahram Chehrazy commented on HBASE-21837:
-----------------------------------------

I don't have a callstack directly caused by this, but nevertheless, it would be 
very similar. Whether the corruption existed in the input file or was created 
by this race condition during processing, it will blew up in the writeBuffer in 
a similar callstack.

> Potential race condition when WALSplitter writes the split results
> ------------------------------------------------------------------
>
>                 Key: HBASE-21837
>                 URL: https://issues.apache.org/jira/browse/HBASE-21837
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 3.0.0
>            Reporter: Bahram Chehrazy
>            Priority: Major
>
> When WALSplitter writes the split buffer, it calls 
> EntryBuffers.getChunkToWrite in WriterThread.doRun. But getChunkToWrite is 
> not thread safe, and could return garbage when called in parallel. Later when 
> it tries to write the chunk using writeBuffer it could throw an exception 
> like this:
>  
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY 
> java.lang.RuntimeException: java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>  at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745) Caused by: 
> java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to