[jira] [Commented] (HBASE-22539) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used

Wellington Chevreuil (JIRA) Tue, 06 Aug 2019 09:06:19 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901208#comment-16901208
 ]


Wellington Chevreuil commented on HBASE-22539:
----------------------------------------------

Thanks all for the responses and attention.
{quote}For me, I just missed Wellington Chevreuil's comments on jira.
{quote}
 
 Yes, the PR ended up diverging the discussions from the jira. Actually, this 
has been a common trend on other jiras too, and I believe there's room for 
communication process improvement here.

 
{quote}Ping Wellington Chevreuil, what's your opinion? I'm going to release 
2.1.6 now...
{quote}
I think it's ok to have this current solution as an interim fix, and include it 
on an emergency release. It's proven to work with the added tests and I don't 
see performance issues with it. My concerns highlighted previously is more 
related to added complexity and further impacts on code maintainability. 

 
{quote}I'd say force sync in not acceptable. The reason why we introduce the 
ASYNC_WAL flag is for performance, and if we do a force sync then the flag is 
useless.
{quote}

 I do agree we'll be loosing ASYNC_WAL feature here, but the _release/retain_ 
solution would effectively lead to same problem, as the call is only released 
on the context of a complete sync. The current solution is even more 
penalising, as the call will be retained even for on-heap BBs (in case of 
SimpleRpcServer connection where request is less than 10K large).

 

Those said, I think [~Apache9] should go ahead with the release containing the 
current solution, then we can close this Jira  and continue discussing about 
alternative, potentially simpler/fancier solutions on another jira. 

> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.2.0, 2.0.5, 2.1.5
>            Reporter: Wellington Chevreuil
>            Assignee: Duo Zhang
>            Priority: Blocker
>             Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
>         Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers 
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom 
> modified jar with the extra sanity checks implemented by HBASE-21401 applied 
> on some code points, plus additional debugging messages, we believe it is 
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to 
> on-heap array triggered 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
>  such as when writing into a non ByteBufferWriter type, as done 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (HBASE-22539) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used

Reply via email to