[
https://issues.apache.org/jira/browse/RATIS-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851747#comment-17851747
]
Tsz-wo Sze commented on RATIS-2094:
-----------------------------------
If 3.1.0 needs another RC, we may include this.
> TransactionContext's stateMachineLogEntry and stateMachineContext may cause
> corruption
> --------------------------------------------------------------------------------------
>
> Key: RATIS-2094
> URL: https://issues.apache.org/jira/browse/RATIS-2094
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Duong
> Assignee: Duong
> Priority: Major
> Fix For: 3.1.1
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> stateMachineLogEntry and stateMachineContext are parsed/created from
> RaftClientRequest or LogEntryProto and attached to TransactionContext in the
> StateMachine.startTransaction methods.
> There are 2 variants of StateMachine.startTransaction;
> 1. startTransaction(RaftClientRequest): This is called only on the leader
> side. The result of this method is not cached and is passed temporarily
> alongside RaftClientRequest for further processing, for example used by
> StateMachine.write.
> 2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both
> leader and follower side. The result of this call is cached on the node
> * On leader: this is called right before applyTransaction to produce a
> TransactionContext for StateMachine.applyTransaction.
> * On follower: this is called when the appendEntries request is received.
> The resulting TransactionContext is cached to be used by StateMachine.write
> and then StateMachine.applyTransaction.
> The startTransaction methods are called with the RaftClientRequest or
> LogEntryProto parsed directly from the original zero-copy buffers. In turn,
> the stateMachineLogEntry and stateMachineContext (which is parsed/created
> from them) can contain data reference directly to the original zero-copy
> buffer without an explicit referent counter.
>
> For the use-case of stateMachineCache=false, this fortunately, doesn't cause
> corruption because the LogEntries liked with the original buffers are cached
> in LogCache, and the cached LogEntries (always) outlive the cached
> TransactionContexts (?).
>
> For the use-case of stateMachine=true, this may cause corruption, because the
> cached LogEntries are decoupled from the original buffers and it depends on
> stateMachineCache to determine when the original zero-copy is released. One
> clear problem is with TransactionContext created by
> startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created
> from the original LogEntries referring to the zero-copy buffers, then cached
> and used later, for example in applyTransaction. At the time it's used, the
> original buffer may have been released already.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)