[
https://issues.apache.org/jira/browse/IGNITE-19267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Bessonov updated IGNITE-19267:
-----------------------------------
Description:
According to the IEP-91, we can delete old data, once it becomes older than the
certain threshold. At the moment, we can consider this threshold to be shared
between different tables, but be unique on individual nodes. It's called a Low
Watermark (LW).
The way the value is chosen is the following:
* There's the {_}data availability time{_}, that can be configured by the
user. This is a cluster configuration. It has a value of, for example, 45
minutes. Valid values - {{{}[0, +INF){}}}.
* There's a {_}GC frequency{_}, that's also a cluster configuration. For
example, 5 minuter. Range of valid values should be more strict.
* Last LW value is persisted in Vault.
* Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min -
maxClockSkew"{}}}.
** If there are no running transactions with timestamp below
{{{}lwCandidate{}}}, we promote the candidate into a real LW value.
** Otherwise, we trigger GC with timestamp of the transaction with the oldest
timestamp (and promoting LW to that timestamp), and raising the bar every time*
that transaction is completed. Eventually, we will reach the point where there
are no running transactions with timestamp below {{{}lwCandidate{}}}.
* it's not necessary to do it every time. But, once the timestamp of the
oldest RO transaction is above or equal to {{{}lwCandidate{}}}, we must
guarantee its promotion. Everything else is optimization.
* If there's a new RO transaction with timestamp below {{{}lwCandidate{}}}, we
fail it.
Promoted LW value cannot become smaller no matter what. All data below LW is
considered to be invalid, maybe broken and completely invisible to user.
was:
According to the IEP-91, we can delete old data, once it becomes older than the
certain threshold. At the moment, we can consider this threshold to be shared
between different tables, but be unique on individual nodes. It's called a Low
Watermark (LW).
The way the value is chosen is the following:
* There's the {_}data availability time{_}, that can be configured by the
user. This is a cluster configuration. It has a value of, for example, 45
minutes. Valid values - {{{}[0, +INF){}}}.
* There's a {_}GC frequency{_}, that's also a cluster configuration. For
example, 5 minuter. Range of valid values should be more strict.
* Last LW value is persisted in Vault.
* Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min -
maxClockSkew"{}}}.
** If there are no running transactions with timestamp below
{{{}lwCandidate{}}}, we promote the candidate into a real LW value.
** Otherwise, we trigger GC with timestamp of the transaction with the oldest
timestamp (and promoting LW to that timestamp), and raising the bar every time*
that transaction is completed. Eventually, we will reach the point where there
are running transactions with timestamp below {{{}lwCandidate{}}}.
* it's not necessary to do it every time. But, once the timestamp of the oldest
RO transaction is above or equal to {{{}lwCandidate{}}}, we must guarantee its
promotion. Everything else is optimization.
* If there's a new RO transaction with timestamp below {{{}lwCandidate{}}}, we
fail it.
Promoted LW value cannot become smaller no matter what. All data below LW is
considered to be invalid, maybe broken and completely invisible to user.
> Implement local Low Watermark propagation
> -----------------------------------------
>
> Key: IGNITE-19267
> URL: https://issues.apache.org/jira/browse/IGNITE-19267
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> According to the IEP-91, we can delete old data, once it becomes older than
> the certain threshold. At the moment, we can consider this threshold to be
> shared between different tables, but be unique on individual nodes. It's
> called a Low Watermark (LW).
> The way the value is chosen is the following:
> * There's the {_}data availability time{_}, that can be configured by the
> user. This is a cluster configuration. It has a value of, for example, 45
> minutes. Valid values - {{{}[0, +INF){}}}.
> * There's a {_}GC frequency{_}, that's also a cluster configuration. For
> example, 5 minuter. Range of valid values should be more strict.
> * Last LW value is persisted in Vault.
> * Every 5 minutes, we assign a new "lwCandidate{{{} = now() - 45min -
> maxClockSkew"{}}}.
> ** If there are no running transactions with timestamp below
> {{{}lwCandidate{}}}, we promote the candidate into a real LW value.
> ** Otherwise, we trigger GC with timestamp of the transaction with the
> oldest timestamp (and promoting LW to that timestamp), and raising the bar
> every time* that transaction is completed. Eventually, we will reach the
> point where there are no running transactions with timestamp below
> {{{}lwCandidate{}}}.
> * it's not necessary to do it every time. But, once the timestamp of the
> oldest RO transaction is above or equal to {{{}lwCandidate{}}}, we must
> guarantee its promotion. Everything else is optimization.
> * If there's a new RO transaction with timestamp below {{{}lwCandidate{}}},
> we fail it.
> Promoted LW value cannot become smaller no matter what. All data below LW is
> considered to be invalid, maybe broken and completely invisible to user.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)