[
https://issues.apache.org/jira/browse/IGNITE-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-19227:
---------------------------------------
Description:
According to
[https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast]
, we might need to wait for schema availability when fetching a schema. If
such waits happen inside a PartitionListener, JRaft threads might be blocked
for a noticeable amount of time (maybe even seconds). We should avoid this.
h3. In RW transactions
When a primary node is going to process a request, it waits till it has all the
schema versions for the corresponding timestamp (beginTs or commitTs) Top (i.e.
that MS SafeTime >= Top). {*}The wait happens outside of JRaft threads{*}. Then
it obtains the global schema revision SR of the latest schema update that is
not later than the corresponding timestamp. It then builds a command (putting
that SR inside) and submits it to RAFT.
When an AppendEntriesRequest is built, Replicator inspects all the entries it
includes in it, extracts SRs from each of them, takes max of them (as MSR, for
‘max schema revision’) and puts it in the AppendEntriesRequest.
When the request is processed by a follower/learner, it compares the MSR from
the request with its locally known MSR (in the Catalog). If the request’s MSR >
local MSR, then the request is rejected (with reason EBUSY). It will be retried
by the leader after some time. As an optimization, we might wait for some time
in hope that the local MSR catches up with the request’s MSR.
As we need an additional field in AppendEntriesRequest that will only be used
by partition groups, we could add a generic container for properties to this
interface, like Map<String, Object> extras().
To extract the SR from a command, we might just deserialize it completely, but
this requires a lot of work that is not necessary. We might serialize commands
having SR in a special way (putting SR in the very first bytes of the message)
to make its retrieval effective.
As the primary has already made sure that it has the schema versions needed to
execute the command, no waits will be needed on the primary node while
executing the RAFT command.
As secondaries/learners refuse AppendEntries which they cannot execute
waitless, they will not have to wait at all in JRaft threads.
A case when the RAFT leader is not collocated with the primary is possible. We
can add the same validation for ActionRequests: pass the required SR inside an
ActionRequest, validate it in ActionRequestProcessor and reject requests having
SR above the local MSR.
h3. In RO transactions
When processing an RO transaction, we just wait for MS SafeTime. This is made
out of RAFT, so no special measures are needed.
was:
According to
[https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast]
, we might need to wait for schema availability when fetching a schema. If
such waits happen inside a PartitionListener, JRaft threads might be blocked
for a noticeable amount of time (maybe even seconds). We should avoid this.
For RW transactions, we can fetch the schema needed by the operation on the
primary replica before submitting a RAFT command to RAFT, so that the possible
wait happen in a user's thread.
For RO transactions, this is not a problem because we don't use RAFT for RO
transactions.
> Wait for schema awailability out of JRaft threads
> -------------------------------------------------
>
> Key: IGNITE-19227
> URL: https://issues.apache.org/jira/browse/IGNITE-19227
> Project: Ignite
> Issue Type: Improvement
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> According to
> [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast]
> , we might need to wait for schema availability when fetching a schema. If
> such waits happen inside a PartitionListener, JRaft threads might be blocked
> for a noticeable amount of time (maybe even seconds). We should avoid this.
> h3. In RW transactions
> When a primary node is going to process a request, it waits till it has all
> the schema versions for the corresponding timestamp (beginTs or commitTs) Top
> (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft
> threads{*}. Then it obtains the global schema revision SR of the latest
> schema update that is not later than the corresponding timestamp. It then
> builds a command (putting that SR inside) and submits it to RAFT.
> When an AppendEntriesRequest is built, Replicator inspects all the entries it
> includes in it, extracts SRs from each of them, takes max of them (as MSR,
> for ‘max schema revision’) and puts it in the AppendEntriesRequest.
> When the request is processed by a follower/learner, it compares the MSR from
> the request with its locally known MSR (in the Catalog). If the request’s MSR
> > local MSR, then the request is rejected (with reason EBUSY). It will be
> retried by the leader after some time. As an optimization, we might wait for
> some time in hope that the local MSR catches up with the request’s MSR.
> As we need an additional field in AppendEntriesRequest that will only be used
> by partition groups, we could add a generic container for properties to this
> interface, like Map<String, Object> extras().
> To extract the SR from a command, we might just deserialize it completely,
> but this requires a lot of work that is not necessary. We might serialize
> commands having SR in a special way (putting SR in the very first bytes of
> the message) to make its retrieval effective.
> As the primary has already made sure that it has the schema versions needed
> to execute the command, no waits will be needed on the primary node while
> executing the RAFT command.
> As secondaries/learners refuse AppendEntries which they cannot execute
> waitless, they will not have to wait at all in JRaft threads.
> A case when the RAFT leader is not collocated with the primary is possible.
> We can add the same validation for ActionRequests: pass the required SR
> inside an ActionRequest, validate it in ActionRequestProcessor and reject
> requests having SR above the local MSR.
> h3. In RO transactions
> When processing an RO transaction, we just wait for MS SafeTime. This is made
> out of RAFT, so no special measures are needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)