[jira] [Commented] (HBASE-22539) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used

Zheng Hu (JIRA) Tue, 06 Aug 2019 03:43:12 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900877#comment-16900877
 ]


Zheng Hu commented on HBASE-22539:
----------------------------------

Discussed with [~Apache9] before,   the current fix is a temporary solution 
(mainly for quick fix and release).  we have a better solution here, say in the 
NettyServerRpcConnection#process method, we can wrap the *Netty's ByteBuf* as 
an  ReferenceHandler, the ReferenceHandler will have retain & release methods. 
{code}
  void process(final ByteBuf buf) throws IOException, InterruptedException {
    if (connectionHeaderRead) {
      this.callCleanup = buf::release;
      process(new SingleByteBuff(buf.nioBuffer()));
    } else {
      ByteBuffer connectionHeader = ByteBuffer.allocate(buf.readableBytes());
      buf.readBytes(connectionHeader);
      buf.release();
      process(connectionHeader);
    }
  }
{code}

In the write path,  we will pass the ReferenceHandler to WALEdit,  once the 
WALEdit  being written to OutputStream ( copy to an new memory area which is 
independent the Netty's ByteBuf memory area in that time) , we will call the 
ReferenceHandler to do the release.

We think the SimpleRpcServer can also use the similar way to fix,  The 
ReferenceHandler will maintain an *HBase ByteBuff *maybe.  

The solution should be good enough, no extra BB copying and seems 
straightforward.  But need some code abstraction (exspecially, our difference 
between branch-2.2 and branch-2.3 & master),  I'm planning to provide the draft 
patch (Still handing something internal, so delayed), will provide that patch 
as soon as possible.

Thanks.


> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.2.0, 2.0.5, 2.1.5
>            Reporter: Wellington Chevreuil
>            Assignee: Duo Zhang
>            Priority: Blocker
>             Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
>         Attachments: HBASE-22539-UT.patch, HBASE-22539.branch-2.001.patch
>
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers 
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom 
> modified jar with the extra sanity checks implemented by HBASE-21401 applied 
> on some code points, plus additional debugging messages, we believe it is 
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to 
> on-heap array triggered 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
>  such as when writing into a non ByteBufferWriter type, as done 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (HBASE-22539) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used

Reply via email to