[
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901208#comment-16901208
]
Wellington Chevreuil commented on HBASE-22539:
----------------------------------------------
Thanks all for the responses and attention.
{quote}For me, I just missed Wellington Chevreuil's comments on jira.
{quote}
Yes, the PR ended up diverging the discussions from the jira. Actually, this
has been a common trend on other jiras too, and I believe there's room for
communication process improvement here.
{quote}Ping Wellington Chevreuil, what's your opinion? I'm going to release
2.1.6 now...
{quote}
I think it's ok to have this current solution as an interim fix, and include it
on an emergency release. It's proven to work with the added tests and I don't
see performance issues with it. My concerns highlighted previously is more
related to added complexity and further impacts on code maintainability.
{quote}I'd say force sync in not acceptable. The reason why we introduce the
ASYNC_WAL flag is for performance, and if we do a force sync then the flag is
useless.
{quote}
I do agree we'll be loosing ASYNC_WAL feature here, but the _release/retain_
solution would effectively lead to same problem, as the call is only released
on the context of a complete sync. The current solution is even more
penalising, as the call will be retained even for on-heap BBs (in case of
SimpleRpcServer connection where request is less than 10K large).
Those said, I think [~Apache9] should go ahead with the release containing the
current solution, then we can close this Jira and continue discussing about
alternative, potentially simpler/fancier solutions on another jira.
> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
> Key: HBASE-22539
> URL: https://issues.apache.org/jira/browse/HBASE-22539
> Project: HBase
> Issue Type: Bug
> Components: rpc, wal
> Affects Versions: 2.2.0, 2.0.5, 2.1.5
> Reporter: Wellington Chevreuil
> Assignee: Duo Zhang
> Priority: Blocker
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom
> modified jar with the extra sanity checks implemented by HBASE-21401 applied
> on some code points, plus additional debugging messages, we believe it is
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to
> on-heap array triggered
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
> such as when writing into a non ByteBufferWriter type, as done
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)