[
https://issues.apache.org/jira/browse/IGNITE-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-19227:
---------------------------------------
Summary: Wait for schema availability out of JRaft threads (was: Wait for
schema awailability out of JRaft threads)
> Wait for schema availability out of JRaft threads
> -------------------------------------------------
>
> Key: IGNITE-19227
> URL: https://issues.apache.org/jira/browse/IGNITE-19227
> Project: Ignite
> Issue Type: Improvement
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: iep-98, ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> According to
> [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast]
> , we might need to wait for schema availability when fetching a schema. If
> such waits happen inside a PartitionListener, JRaft threads might be blocked
> for a noticeable amount of time (maybe even seconds). We should avoid this.
> h3. In RW transactions
> When a primary node is going to process a request, it waits till it has all
> the schema versions for the corresponding timestamp (beginTs or commitTs) Top
> (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft
> threads{*}. Then it obtains the global schema revision SR of the latest
> schema update that is not later than the corresponding timestamp. It then
> builds a command (putting that SR inside) and submits it to RAFT.
> When an AppendEntriesRequest is built, Replicator inspects all the entries it
> includes in it, extracts SRs from each of them, takes max of them (as MSR,
> for ‘max schema revision’) and puts it in the AppendEntriesRequest.
> When the request is processed by a follower/learner, it compares the MSR from
> the request with its locally known MSR (in the Catalog). If the request’s MSR
> > local MSR, then the request is rejected (with reason EBUSY). It will be
> retried by the leader after some time. As an optimization, we might wait for
> some time in hope that the local MSR catches up with the request’s MSR.
> As we need an additional field in AppendEntriesRequest that will only be used
> by partition groups, we could add a generic container for properties to this
> interface, like Map<String, Object> extras().
> To extract the SR from a command, we might just deserialize it completely,
> but this requires a lot of work that is not necessary. We might serialize
> commands having SR in a special way (putting SR in the very first bytes of
> the message) to make its retrieval effective.
> As the primary has already made sure that it has the schema versions needed
> to execute the command, no waits will be needed on the primary node while
> executing the RAFT command.
> As secondaries/learners refuse AppendEntries which they cannot execute
> waitless, they will not have to wait at all in JRaft threads.
> A case when the RAFT leader is not collocated with the primary is possible.
> We can add the same validation for ActionRequests: pass the required SR
> inside an ActionRequest, validate it in ActionRequestProcessor and reject
> requests having SR above the local MSR.
> h3. In RO transactions
> When processing an RO transaction, we just wait for MS SafeTime. This is made
> out of RAFT, so no special measures are needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)