[ 
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898148#comment-16898148
 ] 

Wellington Chevreuil edited comment on HBASE-22539 at 8/2/19 10:19 AM:
-----------------------------------------------------------------------

{quote}Oh the UT has some problem, you should change the loop number to 100 for 
the verifying loop...
{quote}
-Which problems did you have with the UT? I could consistently fail it without 
the fix, and consistently pass it with the fix in place. The UT is also passing 
in the patch build.-

I see, I had unintentionally changed the original test method by mistake. Can 
fix that on a next commit.

Overall, the PR solution is functionally correct. However, it looks a bit over 
complicated, introducing references to the RPC layer _ServerCall_ class on some 
_wal_ package classes and awareness of the introduced _ServerCall.releaseByWal_ 
callback method is now needed by all implementing WAL providers (this could be 
error prone, new wal provider implementations may miss it). Furthermore, 
wouldn't it add an unnecessary penalty for non DBB calls (such as the ones from 
SimpleRpcServer using onheap BB)?

 


was (Author: wchevreuil):
{quote}Oh the UT has some problem, you should change the loop number to 100 for 
the verifying loop...{quote}
Which problems did you have with the UT? I could consistently fail it without 
the fix, and consistently pass it with the fix in place. The UT is also passing 
in the patch build.

I'll check the submitted PR, meanwhile.

> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.2.0, 2.0.5, 2.1.5
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>             Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
>         Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers 
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom 
> modified jar with the extra sanity checks implemented by HBASE-21401 applied 
> on some code points, plus additional debugging messages, we believe it is 
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to 
> on-heap array triggered 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
>  such as when writing into a non ByteBufferWriter type, as done 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to