[ 
https://issues.apache.org/jira/browse/IGNITE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-19267:
-----------------------------------
    Description: 
According to the IEP-91, we can delete old data, once it becomes older than the 
certain threshold. At the moment, we can consider this threshold to be shared 
between different tables, but be unique on individual nodes. It's called a Low 
Watermark (LW).

The way the value is chosen is the following:
 * There's the {_}data availability time{_}, that can be configured by the 
user. This is a cluster configuration. It has a value of, for example, 45 
minutes. Valid values - {{{}[0, +INF){}}}.
 * There's a {_}GC frequency{_}, that's also a cluster configuration. For 
example, 5 minuter. Range of valid values should be more strict.
 * Last LW value is persisted in Vault.
 * Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min - 
maxClockSkew"{}}}.
 ** If there are no running transactions with timestamp below 
{{{}lwCandidate{}}}, we promote the candidate into a real LW value.
 ** Otherwise, we trigger GC with timestamp of the transaction with the oldest 
timestamp (and promoting LW to that timestamp), and raising the bar every time* 
that transaction is completed. Eventually, we will reach the point where there 
are no running transactions with timestamp below {{{}lwCandidate{}}}.
 * it's not necessary to do it every time. But, once the timestamp of the 
oldest RO transaction is above or equal to {{{}lwCandidate{}}}, we must 
guarantee its promotion. Everything else is optimization.
 * If there's a new RO transaction with timestamp below {{{}lwCandidate{}}}, we 
fail it.

Promoted LW value cannot become smaller no matter what. All data below LW is 
considered to be invalid, maybe broken and completely invisible to user.

  was:
According to the IEP-91, we can delete old data, once it becomes older than the 
certain threshold. At the moment, we can consider this threshold to be shared 
between different tables, but be unique on individual nodes. It's called a Low 
Watermark (LW).

The way the value is chosen is the following:
 * There's the {_}data availability time{_}, that can be configured by the 
user. This is a cluster configuration. It has a value of, for example, 45 
minutes. Valid values - {{{}[0, +INF){}}}.
 * There's a {_}GC frequency{_}, that's also a cluster configuration. For 
example, 5 minuter. Range of valid values should be more strict.
 * Last LW value is persisted in Vault.
 * Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min - 
maxClockSkew"{}}}.
 ** If there are no running transactions with timestamp below 
{{{}lwCandidate{}}}, we promote the candidate into a real LW value.
 ** Otherwise, we trigger GC with timestamp of the transaction with the oldest 
timestamp (and promoting LW to that timestamp), and raising the bar every time* 
that transaction is completed. Eventually, we will reach the point where there 
are running transactions with timestamp below {{{}lwCandidate{}}}.
* it's not necessary to do it every time. But, once the timestamp of the oldest 
RO transaction is above or equal to {{{}lwCandidate{}}}, we must guarantee its 
promotion. Everything else is optimization.
 * If there's a new RO transaction with timestamp below {{{}lwCandidate{}}}, we 
fail it.

Promoted LW value cannot become smaller no matter what. All data below LW is 
considered to be invalid, maybe broken and completely invisible to user.


> Implement local Low Watermark propagation
> -----------------------------------------
>
>                 Key: IGNITE-19267
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19267
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> According to the IEP-91, we can delete old data, once it becomes older than 
> the certain threshold. At the moment, we can consider this threshold to be 
> shared between different tables, but be unique on individual nodes. It's 
> called a Low Watermark (LW).
> The way the value is chosen is the following:
>  * There's the {_}data availability time{_}, that can be configured by the 
> user. This is a cluster configuration. It has a value of, for example, 45 
> minutes. Valid values - {{{}[0, +INF){}}}.
>  * There's a {_}GC frequency{_}, that's also a cluster configuration. For 
> example, 5 minuter. Range of valid values should be more strict.
>  * Last LW value is persisted in Vault.
>  * Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min - 
> maxClockSkew"{}}}.
>  ** If there are no running transactions with timestamp below 
> {{{}lwCandidate{}}}, we promote the candidate into a real LW value.
>  ** Otherwise, we trigger GC with timestamp of the transaction with the 
> oldest timestamp (and promoting LW to that timestamp), and raising the bar 
> every time* that transaction is completed. Eventually, we will reach the 
> point where there are no running transactions with timestamp below 
> {{{}lwCandidate{}}}.
>  * it's not necessary to do it every time. But, once the timestamp of the 
> oldest RO transaction is above or equal to {{{}lwCandidate{}}}, we must 
> guarantee its promotion. Everything else is optimization.
>  * If there's a new RO transaction with timestamp below {{{}lwCandidate{}}}, 
> we fail it.
> Promoted LW value cannot become smaller no matter what. All data below LW is 
> considered to be invalid, maybe broken and completely invisible to user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to