[
https://issues.apache.org/jira/browse/IGNITE-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-19271:
---------------------------------------
Labels: iep-98 ignite-3 (was: ignite-3)
> Persist revision-safeTime mapping in meta-storage
> -------------------------------------------------
>
> Key: IGNITE-19271
> URL: https://issues.apache.org/jira/browse/IGNITE-19271
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Semyon Danilov
> Priority: Major
> Labels: iep-98, ignite-3
> Fix For: 3.0.0-beta2
>
>
> IEP-98 states:
> {code:java}
> When creating a message M telling the cluster about a schema update
> activation moment, choose the message timestamp Tm (moving safeTime forward)
> equal to Now, but assign Tu (activation moment) contained in that M to be
> Tm+DD {code}
> This is hard to achieve.
> h3. Problem
> We need {{{}Tu==Tm+DD{}}}. Right now, with what we have in IGNITE-19028, it's
> not straightforward. This is because we have too many actors:
> * There's a {_}client{_}, that chooses Tu, because it's the only actor that
> can affect message content.
> * There's a meta-storage {_}lease-holder{_}, or {_}leader{_}, that chooses
> Tm.
> * There's everybody else, who expect a correspondence between Tu and Tm.
> First two actors are important, because they have independent clocks, but
> must coordinate the same event. This is impossible with described protocol.
> h3. Discussion
> Let's consider these two solutions:
> # Client generates Tm.
> # Meta-storage generates Tu.
> Option 1 is out of question, there must be only a single node at any given
> moment in time, that's responsible for the linear order of time in messages.
> What about option 2? Since meta-storage doesn't know anything about commands
> semantics, it can't really generate any data. So this solution doesn't work
> either.
> h3. Solution
> Combined solution could be the following:
> * Client sends DD as part of the command (this is not a constant, user _can_
> configure it, if they really feel like doing it)
> * Meta-storage generates {{Tm}}
> * Every node, upon receiving the update, calculates {{Tu}}
> This could work, if nodes would have never been restarted. There's one
> problem that needs to be solved: recovering the values of {{Tm}} from the
> (old) data upon node restart.
> This can be achieved by persisting safeTime along with revision as a part of
> metadata, that can be retrieved back through the meta-storage service API.
> In other words:
> 1. Client sends
> {code:java}
> schema.latest = 5
> schema.5.data = ...
> schema.5.dd = 30s{code}
> 2. Lease-holder adds meta-data to the command:
> {code:java}
> safeTime = 10:10
> {code}
> 3. Meta-storage listener writes the data:
> {code:java}
> revision = 33
> schema.latest = 5
> schema.5.data = ...
> schema.5.dd = 30s
> revision.33.safeTime = 10:10:00{code}
>
> How can you read {{{}Tu{}}}:
> * read "{{{}schema.5.dd"{}}};
> * read its revision, it's 33;
> * read a timestamp of revision 33 via specialized API;
> * add two values together.
> h3. Implications and restrictions
> There's a cleanup process in the meta-storage. It will eventually remove any
> "revision.x.safeTime" values, because corresponding revision became obsolete.
> But, we should somehow preserve timestamps of revisions that are used by
> schemas. Such behaviour can be achieved, if components can reserve a
> revision, and meta-storage can't compact it unless the reservation has been
> revoked.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)