[jira] [Commented] (HBASE-24984) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used with multi operation

Anoop Sam John (Jira) Thu, 15 Jul 2021 21:16:09 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381739#comment-17381739
 ]


Anoop Sam John commented on HBASE-24984:
----------------------------------------

There is some diff when it is FSHLog or AsyncFSWAL.  In FSHLog , once the 
Append is completed (by the Ringbuffer append thread) for a FSWALEntry, the 
releaseByWAL() will get called for that.
In case of AsyncFSWAL this is after sync done for that FSWALEntry.
Still that will cause the bug I believe.  Need to work more in UT to repro that 
case.
It is this way.  A multi call comes with ASYNC_WAL Durability and assume it 
created 2 minibatch and so 2 FSWALEntry will get added for this.  Now assume 
this handler thread added 1st FSWALEntry  to ring buffer.
Before adding the 2nd FSWALEntry , another write request comes to RS and 
handled by another handler thread. This write request is with default 
Durability only.  And say this handler added the FSWALEntry into Q.
And then only the 1st thread could add the FSWALEntry for the 2nd minibatch.
Now as part of the sync op for the 2nd write req (with default Durability ), 
sync is completed for the very 1st FSWALEntry added to Q (This is part of multi 
op with ASYNC_WAL Durability)
This will cause releaseByWAL () on the FSWALEntry and as per current logic, 
this will release the BB to pool.
When the next FSWALEntry is actually appended to WAL, we can possibly get the 
corruption issue if the BB is reused by some other thread.
So the bug is applicable for all combination of RPC server and WAL impl. And 
all versions.   

> WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used 
> with multi operation
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24984
>                 URL: https://issues.apache.org/jira/browse/HBASE-24984
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.1.6
>            Reporter: Liu Junhong
>            Assignee: Gaurav Kanade
>            Priority: Critical
>             Fix For: 2.5.0, 2.3.6, 3.0.0-alpha-2, 2.4.5
>
>         Attachments: 
> 0001-HBASE-24984-WAL-corruption-due-to-early-DBBs-re-use-.patch
>
>
> After bugfix HBASE-22539, When client use BufferedMutator or multiple 
> mutation , there will be one RpcCall and mutliple FSWALEntry .  At the time 
> RpcCall finish and one FSWALEntry call release() , the remain FSWALEntries 
> may trigger RuntimeException or segmentation fault .
> We should use RefCnt  instead of AtomicInteger for 
> org.apache.hadoop.hbase.ipc.ServerCall.reference?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24984) WAL corruption due to early DBBs re-use when Durability.ASYNC_WAL is used with multi operation

Reply via email to