[
https://issues.apache.org/jira/browse/RATIS-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated RATIS-2094:
-------------------------
Description:
stateMachineLogEntry and stateMachineContext are parsed/created from
RaftClientRequest or LogEntryProto and attached to TransactionContext in the
StateMachine.startTransaction methods.
There are 2 variants of StateMachine.startTransaction;
1. startTransaction(RaftClientRequest): This is called only on the leader side.
The result of this method is not cached and is passed temporarily alongside
RaftClientRequest for further processing, for example used by
StateMachine.write.
2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both leader
and follower side. The result of this call is cached on the node
* On leader: this is called right before applyTransaction to produce a
TransactionContext for StateMachine.applyTransaction.
* On follower: this is called when the appendEntries request is received. The
resulting TransactionContext is cached to be used by StateMachine.write and
then StateMachine.applyTransaction.
The startTransaction methods are called with the RaftClientRequest or
LogEntryProto parsed directly from the original zero-copy buffers. In turn, the
stateMachineLogEntry and stateMachineContext (which is parsed/created from
them) can contain data reference directly to the original zero-copy buffer
without an explicit referent counter.
For the use-case of stateMachineCache=false, this fortunately, doesn't cause
corruption because the LogEntries liked with the original buffers are cached in
LogCache, and the cached LogEntries (always) outlive the cached
TransactionContexts (?).
For the use-case of stateMachine=true, this may cause corruption, because the
cached LogEntries are decoupled from the original buffers and it depends on
stateMachineCache to determine when the original zero-copy is released. One
clear problem is with TransactionContext created by
startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created
from the original LogEntries referring to the zero-copy buffers, then cached
and used later, for example in applyTransaction. At the time it's used, the
original buffer may have been released already.
was:
stateMachineLogEntry and stateMachineContext are parsed/created from
RaftClientRequest or LogEntryProto and attached to TransactionContext in the
StateMachine.startTransaction methods.
There are 2 variants of StateMachine.startTransaction;
1. startTransaction(RaftClientRequest): This is called only on the leader side.
The result of this method is not cached and is passed temporarily alongside
RaftClientRequest for further processing, for example.
2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both leader
and follower side. The result of this call is cached on the node
* On leader: this is called right before applyTransaction to produce a
TransactionContext for StateMachine.applyTransaction.
* On follower: this is called when the appendEntries request is received. The
result is cached to be used by StateMachine.write and then
StateMachine.applyTransaction.
The startTransaction methods are called with the RaftClientRequest or
LogEntryProto parsed directly from the original zero-copy buffers. In turn, the
stateMachineLogEntry and stateMachineContext (which is parsed/created from
them) will have data reference directly to the original zero-copy buffer
without explicit referent counter. The fact that TransactionContext is cached
makes it worse.
> TransactionContext's stateMachineLogEntry and stateMachineContext may cause
> corruption
> --------------------------------------------------------------------------------------
>
> Key: RATIS-2094
> URL: https://issues.apache.org/jira/browse/RATIS-2094
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Duong
> Assignee: Duong
> Priority: Major
>
> stateMachineLogEntry and stateMachineContext are parsed/created from
> RaftClientRequest or LogEntryProto and attached to TransactionContext in the
> StateMachine.startTransaction methods.
> There are 2 variants of StateMachine.startTransaction;
> 1. startTransaction(RaftClientRequest): This is called only on the leader
> side. The result of this method is not cached and is passed temporarily
> alongside RaftClientRequest for further processing, for example used by
> StateMachine.write.
> 2. startTransaction(LogEntryProto, RaftPeerRole): this is called on both
> leader and follower side. The result of this call is cached on the node
> * On leader: this is called right before applyTransaction to produce a
> TransactionContext for StateMachine.applyTransaction.
> * On follower: this is called when the appendEntries request is received.
> The resulting TransactionContext is cached to be used by StateMachine.write
> and then StateMachine.applyTransaction.
> The startTransaction methods are called with the RaftClientRequest or
> LogEntryProto parsed directly from the original zero-copy buffers. In turn,
> the stateMachineLogEntry and stateMachineContext (which is parsed/created
> from them) can contain data reference directly to the original zero-copy
> buffer without an explicit referent counter.
>
> For the use-case of stateMachineCache=false, this fortunately, doesn't cause
> corruption because the LogEntries liked with the original buffers are cached
> in LogCache, and the cached LogEntries (always) outlive the cached
> TransactionContexts (?).
>
> For the use-case of stateMachine=true, this may cause corruption, because the
> cached LogEntries are decoupled from the original buffers and it depends on
> stateMachineCache to determine when the original zero-copy is released. One
> clear problem is with TransactionContext created by
> startTransaction(LogEntryProto, RaftPeerRole) on the follower. it is created
> from the original LogEntries referring to the zero-copy buffers, then cached
> and used later, for example in applyTransaction. At the time it's used, the
> original buffer may have been released already.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)