[
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900827#comment-16900827
]
Wellington Chevreuil edited comment on HBASE-22539 at 8/6/19 10:25 AM:
-----------------------------------------------------------------------
[~Apache9], [~stack], am just reopening this for further opinion on my former
comment pasted below.
{noformat}
Overall, the PR solution is functionally correct.
However, it looks a bit over complicated, introducing references to the RPC
layer ServerCall class on some wal package classes and awareness of the
introduced ServerCall.releaseByWal callback method is now needed by all
implementing WAL providers (this could be error prone, new wal provider
implementations may miss it). Furthermore, wouldn't it add an unnecessary
penalty for non DBB calls (such as the ones from SimpleRpcServer using onheap
BB)?{noformat}
If you guys agree with that, I can further work on the patch I had previously
proposed on a separate jira.
was (Author: wchevreuil):
[~Apache9], [~stack], am just reopening this for further opinion on my former
comment pasted below.
{noformat}
Overall, the PR solution is functionally correct. However, it looks a bit over
complicated, introducing references to the RPC layer ServerCall class on some
wal package classes and awareness of the introduced ServerCall.releaseByWal
callback method is now needed by all implementing WAL providers (this could be
error prone, new wal provider implementations may miss it). Furthermore,
wouldn't it add an unnecessary penalty for non DBB calls (such as the ones from
SimpleRpcServer using onheap BB)?{noformat}
If you guys agree with that, I can further work on the patch I had previously
proposed on a separate jira.
> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
> Key: HBASE-22539
> URL: https://issues.apache.org/jira/browse/HBASE-22539
> Project: HBase
> Issue Type: Bug
> Components: rpc, wal
> Affects Versions: 2.2.0, 2.0.5, 2.1.5
> Reporter: Wellington Chevreuil
> Assignee: Duo Zhang
> Priority: Blocker
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom
> modified jar with the extra sanity checks implemented by HBASE-21401 applied
> on some code points, plus additional debugging messages, we believe it is
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to
> on-heap array triggered
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
> such as when writing into a non ByteBufferWriter type, as done
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)