[ 
https://issues.apache.org/jira/browse/RATIS-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851747#comment-17851747
 ] 

Tsz-wo Sze commented on RATIS-2094:
-----------------------------------

If 3.1.0 needs another RC, we may include this.

> TransactionContext's stateMachineLogEntry and stateMachineContext may cause 
> corruption
> --------------------------------------------------------------------------------------
>
>                 Key: RATIS-2094
>                 URL: https://issues.apache.org/jira/browse/RATIS-2094
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: Duong
>            Assignee: Duong
>            Priority: Major
>             Fix For: 3.1.1
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> stateMachineLogEntry and stateMachineContext are parsed/created from 
> RaftClientRequest or LogEntryProto and attached to TransactionContext in the 
> StateMachine.startTransaction methods.
> There are 2 variants of StateMachine.startTransaction;
> 1. startTransaction(RaftClientRequest): This is called only on the leader 
> side. The result of this method is not cached and is passed temporarily 
> alongside RaftClientRequest for further processing, for example used by 
> StateMachine.write. 
> 2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both 
> leader and follower side. The result of this call is cached on the node
>  * On leader: this is called right before applyTransaction to produce a 
> TransactionContext for StateMachine.applyTransaction.
>  * On follower: this is called when the appendEntries request is received. 
> The resulting TransactionContext is cached to be used by StateMachine.write 
> and then StateMachine.applyTransaction.
> The startTransaction methods are called with the RaftClientRequest or 
> LogEntryProto parsed directly from the original zero-copy buffers. In turn, 
> the stateMachineLogEntry and stateMachineContext (which is parsed/created 
> from them) can contain data reference directly to the original zero-copy 
> buffer without an explicit referent counter.
>  
> For the use-case of stateMachineCache=false, this fortunately, doesn't cause 
> corruption because the LogEntries liked with the original buffers are cached 
> in LogCache, and the cached LogEntries (always) outlive the cached 
> TransactionContexts (?).
>  
> For the use-case of stateMachine=true, this may cause corruption, because the 
> cached LogEntries are decoupled from the original buffers and it depends on 
> stateMachineCache to determine when the original zero-copy is released. One 
> clear problem is with TransactionContext created by 
> startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created 
> from the original LogEntries referring to the zero-copy buffers, then cached 
> and used later, for example in applyTransaction. At the time it's used, the 
> original buffer may have been released already.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to