[ 
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895390#comment-16895390
 ] 

Wellington Chevreuil commented on HBASE-22539:
----------------------------------------------

{quote}I guess the problem is that we release the ByteBuf too earlier...{quote}
Yep, just changed the title to reflect that, since we discarded the original 
suspicion around unsafe copy.

{quote}But seems the only way to release the ByteBuf is to finish the rpc 
call...{quote}
Hum, the stack trace suggests we are probably on a separate thread from 
*ringbuffer*. Maybe the rpc thread has reached its endpoint where the DBB is 
then released? 

{noformat}
        at 
org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:555)
        at 
org.apache.hadoop.hbase.KeyValueUtil.isBufferValid(KeyValueUtil.java:532)
        at 
org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:99)
        at 
org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(ByteBufferUtils.java:451)
        at 
org.apache.hadoop.hbase.ByteBufferKeyValue.write(ByteBufferKeyValue.java:277)
        at org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:794)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:382)
        at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:54)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:302)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:67)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.append(AbstractFSWAL.java:918)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1082)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:973)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:881)
        at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:129)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

> Potential WAL corruption due to early DBBs re-use. 
> ---------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.1.1
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers 
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom 
> modified jar with the extra sanity checks implemented by HBASE-21401 applied 
> on some code points, plus additional debugging messages, we believe it is 
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to 
> on-heap array triggered 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
>  such as when writing into a non ByteBufferWriter type, as done 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to