[ 
https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895644#comment-17895644
 ] 

Ivan Bessonov edited comment on IGNITE-23413 at 11/11/24 2:00 PM:
------------------------------------------------------------------

What I think needs to be done.

We should fix current local meta-storage version and use it for further reads. 
Out of all timestamps that I mention further in the comment, we must choose the 
minimal one.
 * For zones that have pending or planned assignments, we should read them 
locally, there's a timestamp stored in each individual assignment.
 * For zones that only have stable assignments:
 ** Current implementation of data nodes timers uses "latest" catalog version 
to determine the timestamp for future pending assignments. Until 
https://issues.apache.org/jira/browse/IGNITE-22723 is not implemented, we 
should do the same - be sure that latest catalog version is available. We can 
use the timestamp of current meta-storage revision for that.
 * For zones that have no assignments at all:
 ** This means that zone is either already deleted and assignments are dropped, 
in which case we don't need to do anything.
 ** Or it means that we have not yet saved the initial stable assignments. 
Let's imagine that.
- zone had been created on catalog version 10;
- zone had been updated on catalog version 12 (why not);
- current catalog version is 15, zone does exist in it.
In these circumstances, I propose returning the timestamp of catalog version 
*11* - the latest available timestamp that has the "initial" zone parameters.
There is a possible side effect - if some code will try to read the revision 
that corresponds to version 10, they would get a compaction exception. This 
would mean that they would need to retry the read on oldest known revision - it 
will still have the same version of distribution zone.
 * If there are no pending and planned assignments, we might return the 
timestamp that's associated with meta-storage revision.


was (Author: ibessonov):
What I think needs to be done:
 * We should fix current local meta-storage version and use it for further 
reads.
 * We should read all pending and planned assignments, there's a timestamp 
stored in each of them. We calculate the minimal one.
 * If there are no pending and planned assignments, we might return the 
timestamp that's associated with meta-storage revision.

This approach almost works, there are situations with nuances. These are the 
cases where we're in-between some operation. Let's examine them:
 * Zone is created/altered, but assignments are not yet saved. This is the case 
when assignments timestamp is below latest zone's timestamp.
 ** For ALTER, returning an older timestamp from assignments is not a problem, 
it'll eventually become more recent.
 ** For CREATE, we should probably determine if assignments are not yet saved, 
and use the timestamp from catalog.
 * A list of data nodes is updated, but assignments are not yet re-calculated 
because of timeout.
 ** Current code uses "latest" catalog state when it transforms data nodes into 
assignments, so it is safe to use timestamp of latest catalog version.
There's a Jira that aims to fix it: 
https://issues.apache.org/jira/browse/IGNITE-22723. This means that current 
approach might not work in the future.

Anyway, considering everything from the above, there's one situation that we 
must keep in mind:
 * DZ is updated at catalog version 15.
 * Assignments are calculated for the same exact catalog version.
 * Nothing changes for a long time. Several days, for example.
 * During that time, catalog version increases and becomes 75, for example.

If nothing changes, we should be able to remove versions 15-74, because DZ 
settings from versions 15 and 75 are identical.
It seems like the proposed algorithm works exactly like we need.

> Catalog compaction. Component to determine minimum catalog version required 
> by rebalance.
> -----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-23413
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23413
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Pavel Pereslegin
>            Assignee: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> Each rebalance procedure uses specific catalog version, it "holds" the 
> timestamp corresponding to the latest (at the moment of rebalancing start) 
> version of the catalog
> To be able safely perform catalog compaction, we need to design and implement 
> a component that can determine the minimum version required for active 
> rebalances (to avoid deleting this version during compaction).
> {code:java}
> interface RebalanceMinimumRequiredTimeProvider {
>     /**
>      * Returns the minimum time required for rebalance,
>      * or current timestamp if there are no active 
>      * rebalances and there is a guarantee that all rebalances
>      * launched in the future will use catalog version 
>      * corresponding to the current time or greater.
>      */
>     long minimumRequiredTime();
> }
> {code}
> The component can be either global or local (whichever is easier to 
> implement). This means that the compaction procedure can call the component 
> on all nodes in the cluster and calculate the minimum.
> The component must be able to track rebalances that may be triggered during 
> "replay" of the metastorage raft log.
> The component should return only monotonically increasing values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to