[
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900877#comment-16900877
]
Zheng Hu commented on HBASE-22539:
----------------------------------
Discussed with [~Apache9] before, the current fix is a temporary solution
(mainly for quick fix and release). we have a better solution here, say in the
NettyServerRpcConnection#process method, we can wrap the *Netty's ByteBuf* as
an ReferenceHandler, the ReferenceHandler will have retain & release methods.
{code}
void process(final ByteBuf buf) throws IOException, InterruptedException {
if (connectionHeaderRead) {
this.callCleanup = buf::release;
process(new SingleByteBuff(buf.nioBuffer()));
} else {
ByteBuffer connectionHeader = ByteBuffer.allocate(buf.readableBytes());
buf.readBytes(connectionHeader);
buf.release();
process(connectionHeader);
}
}
{code}
In the write path, we will pass the ReferenceHandler to WALEdit, once the
WALEdit being written to OutputStream ( copy to an new memory area which is
independent the Netty's ByteBuf memory area in that time) , we will call the
ReferenceHandler to do the release.
We think the SimpleRpcServer can also use the similar way to fix, The
ReferenceHandler will maintain an *HBase ByteBuff *maybe.
The solution should be good enough, no extra BB copying and seems
straightforward. But need some code abstraction (exspecially, our difference
between branch-2.2 and branch-2.3 & master), I'm planning to provide the draft
patch (Still handing something internal, so delayed), will provide that patch
as soon as possible.
Thanks.
> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
> Key: HBASE-22539
> URL: https://issues.apache.org/jira/browse/HBASE-22539
> Project: HBase
> Issue Type: Bug
> Components: rpc, wal
> Affects Versions: 2.2.0, 2.0.5, 2.1.5
> Reporter: Wellington Chevreuil
> Assignee: Duo Zhang
> Priority: Blocker
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom
> modified jar with the extra sanity checks implemented by HBASE-21401 applied
> on some code points, plus additional debugging messages, we believe it is
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to
> on-heap array triggered
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
> such as when writing into a non ByteBufferWriter type, as done
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)