[ 
https://issues.apache.org/jira/browse/IGNITE-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Semyon Danilov updated IGNITE-19271:
------------------------------------
    Description: 
IEP-98 states:
{code:java}
When creating a message M telling the cluster about a schema update activation 
moment, choose the message timestamp Tm (moving safeTime forward) equal to Now, 
but assign Tu (activation moment) contained in that M to be Tm+DD {code}
This is hard to achieve.
h3. Problem

We need {{{}Tu==Tm+DD{}}}. Right now, with what we have in IGNITE-19028, it's 
not straightforward. This is because we have too many actors:
 * There's a {_}client{_}, that chooses Tu, because it's the only actor that 
can affect message content.
 * There's a meta-storage {_}lease-holder{_}, or {_}leader{_}, that chooses Tm.
 * There's everybody else, who expect a correspondence between Tu and Tm.

First two actors are important, because they have independent clocks, but must 
coordinate the same event. This is impossible with described protocol.
h3. Discussion

Let's consider these two solutions:
 # Client generates Tm.
 # Meta-storage generates Tu.

Option 1 is out of question, there must be only a single node at any given 
moment in time, that's responsible for the linear order of time in messages.

What about option 2? Since meta-storage doesn't know anything about commands 
semantics, it can't really generate any data. So this solution doesn't work 
either.
h3. Solution

Combined solution could be the following:
 * Client sends DD as part of the command (this is not a constant, user _can_ 
configure it, if they really feel like doing it)
 * Meta-storage generates {{Tm}}
 * Every node, upon receiving the update, calculates {{Tu}}

This could work, if nodes would have never been restarted. There's one problem 
that needs to be solved: recovering the values of {{Tm}} from the (old) data 
upon node restart.

This can be achieved by persisting safeTime along with revision as a part of 
metadata, that can be retrieved back through the meta-storage service API.

In other words:

1. Client sends
{code:java}
schema.latest   = 5
schema.5.data   = ...
schema.5.dd     = 30s{code}
2. Lease-holder adds meta-data to the command:
{code:java}
safeTime = 10:10
{code}
3. Meta-storage listener writes the data:
{code:java}
revision = 33
    schema.latest = 5
    schema.5.data = ...
    schema.5.dd   = 30s

revision.33.safeTime = 10:10:00{code}
 

How can you read {{{}Tu{}}}:
 * read "{{{}schema.5.dd"{}}};
 * read its revision, it's 33;
 * read a timestamp of revision 33 via specialized API;
 * add two values together.

h3. Implications and restrictions

There's a cleanup process in the meta-storage. It will eventually remove any 
"revision.x.safeTime" values, because corresponding revision became obsolete.

But, we should somehow preserve timestamps of revisions that are used by 
schemas. Such behaviour can be achieved, if components can reserve a revision, 
and meta-storage can't compact it unless the reservation has been revoked.

  was:
IEP-98 states:
{code:java}
When creating a message M telling the cluster about a schema update activation 
moment, choose the message timestamp Tm (moving safeTime forward) equal to Now, 
but assign Tu (activation moment) contained in that M to be Tm+DD {code}
This is hard to achieve.
h3. Problem

We need {{{}Tu==Tm+DD{}}}. Right now, with what we have in IGNITE-19028, it's 
not straightforward. This is because we have too many actors:
 * There's a {_}client{_}, that chooses Tu, because it's the only actor that 
can affect message content.
 * There's a meta-storage {_}lease-holder{_}, or {_}leader{_}, that chooses Tm.
 * There's everybody else, who expect a correspondence between Tu and Tm.

First two actors are important, because they have independent clocks, but must 
coordinate the same event. This is impossible with described protocol.
h3. Discussion

Let's consider these two solutions:
 # Client generates Tm.
 # Meta-storage generates Tu.

Option 1 is out of question, there must be only a single node at any given 
moment in time, that's responsible for the linear order of time in messages.

What about option 2? Since meta-storage doesn't know anything about commands 
semantics, it can't really generate any data. So this solution doesn't work 
either.
h3. Solution

Combined solution could be the following:
 * Client sends DD as part of the command (this is not a constant, user _can_ 
configure it, if they really feel like doing it)
 * Meta-storage generates {{Tm}}
 * Every node, upon receiving the update, calculates {{Tu}}

This could work, if nodes would have never been restarted. There's one problem 
that needs to be solved: recovering the values of {{Tm}} from the (old) data 
upon node restart.

This can be achieved by persisting safeTime along with revision as a part of 
metadata, that can be retrieved back through the meta-storage service API.

In other words:

1. Client sends
{code:java}
schema.latest   = 5
schema.5.data   = ...
schema.5.dd     = 30s{code}
2. Lease-holder adds meta-data to the command:
{code:java}
safeTime = 10:10
{code}
3. Meta-storage listener writes the data:
{code:java}
revision = 33
    schema.latest = 5
    schema.5.data = ...
    schema.5.dd   = 30s

revision.33.safeTime = 10:10:00{code}
 

How can you read {{{}Tu{}}}:
 * read "{{{}schema.5.dd"{}}};
 * read its revision, it's 33;
 * read a timestamp of revision 33 via specialized API;
 * add two values together.

h3. Implications and restrictions

There's a cleanup process in the meta-storage. It will eventually remove any 
"revision.x.safeTime" values, because corresponding revision became obsolete.

But, we should somehow preserve timestamps of revisions that are used y 
schemas. Such behavior can be achieved, if components can reserve a revision, 
and meta-storage can't compact it unless the reservation has been revoked.


> Persist revision-safeTime mapping in meta-storage
> -------------------------------------------------
>
>                 Key: IGNITE-19271
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19271
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> IEP-98 states:
> {code:java}
> When creating a message M telling the cluster about a schema update 
> activation moment, choose the message timestamp Tm (moving safeTime forward) 
> equal to Now, but assign Tu (activation moment) contained in that M to be 
> Tm+DD {code}
> This is hard to achieve.
> h3. Problem
> We need {{{}Tu==Tm+DD{}}}. Right now, with what we have in IGNITE-19028, it's 
> not straightforward. This is because we have too many actors:
>  * There's a {_}client{_}, that chooses Tu, because it's the only actor that 
> can affect message content.
>  * There's a meta-storage {_}lease-holder{_}, or {_}leader{_}, that chooses 
> Tm.
>  * There's everybody else, who expect a correspondence between Tu and Tm.
> First two actors are important, because they have independent clocks, but 
> must coordinate the same event. This is impossible with described protocol.
> h3. Discussion
> Let's consider these two solutions:
>  # Client generates Tm.
>  # Meta-storage generates Tu.
> Option 1 is out of question, there must be only a single node at any given 
> moment in time, that's responsible for the linear order of time in messages.
> What about option 2? Since meta-storage doesn't know anything about commands 
> semantics, it can't really generate any data. So this solution doesn't work 
> either.
> h3. Solution
> Combined solution could be the following:
>  * Client sends DD as part of the command (this is not a constant, user _can_ 
> configure it, if they really feel like doing it)
>  * Meta-storage generates {{Tm}}
>  * Every node, upon receiving the update, calculates {{Tu}}
> This could work, if nodes would have never been restarted. There's one 
> problem that needs to be solved: recovering the values of {{Tm}} from the 
> (old) data upon node restart.
> This can be achieved by persisting safeTime along with revision as a part of 
> metadata, that can be retrieved back through the meta-storage service API.
> In other words:
> 1. Client sends
> {code:java}
> schema.latest   = 5
> schema.5.data   = ...
> schema.5.dd     = 30s{code}
> 2. Lease-holder adds meta-data to the command:
> {code:java}
> safeTime = 10:10
> {code}
> 3. Meta-storage listener writes the data:
> {code:java}
> revision = 33
>     schema.latest = 5
>     schema.5.data = ...
>     schema.5.dd   = 30s
> revision.33.safeTime = 10:10:00{code}
>  
> How can you read {{{}Tu{}}}:
>  * read "{{{}schema.5.dd"{}}};
>  * read its revision, it's 33;
>  * read a timestamp of revision 33 via specialized API;
>  * add two values together.
> h3. Implications and restrictions
> There's a cleanup process in the meta-storage. It will eventually remove any 
> "revision.x.safeTime" values, because corresponding revision became obsolete.
> But, we should somehow preserve timestamps of revisions that are used by 
> schemas. Such behaviour can be achieved, if components can reserve a 
> revision, and meta-storage can't compact it unless the reservation has been 
> revoked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to