[
https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895644#comment-17895644
]
Ivan Bessonov edited comment on IGNITE-23413 at 11/11/24 2:00 PM:
------------------------------------------------------------------
What I think needs to be done.
We should fix current local meta-storage version and use it for further reads.
Out of all timestamps that I mention further in the comment, we must choose the
minimal one.
* For zones that have pending or planned assignments, we should read them
locally, there's a timestamp stored in each individual assignment.
* For zones that only have stable assignments:
** Current implementation of data nodes timers uses "latest" catalog version
to determine the timestamp for future pending assignments. Until
https://issues.apache.org/jira/browse/IGNITE-22723 is not implemented, we
should do the same - be sure that latest catalog version is available. We can
use the timestamp of current meta-storage revision for that.
* For zones that have no assignments at all:
** This means that zone is either already deleted and assignments are dropped,
in which case we don't need to do anything.
** Or it means that we have not yet saved the initial stable assignments.
Let's imagine that.
- zone had been created on catalog version 10;
- zone had been updated on catalog version 12 (why not);
- current catalog version is 15, zone does exist in it.
In these circumstances, I propose returning the timestamp of catalog version
*11* - the latest available timestamp that has the "initial" zone parameters.
There is a possible side effect - if some code will try to read the revision
that corresponds to version 10, they would get a compaction exception. This
would mean that they would need to retry the read on oldest known revision - it
will still have the same version of distribution zone.
* If there are no pending and planned assignments, we might return the
timestamp that's associated with meta-storage revision.
was (Author: ibessonov):
What I think needs to be done:
* We should fix current local meta-storage version and use it for further
reads.
* We should read all pending and planned assignments, there's a timestamp
stored in each of them. We calculate the minimal one.
* If there are no pending and planned assignments, we might return the
timestamp that's associated with meta-storage revision.
This approach almost works, there are situations with nuances. These are the
cases where we're in-between some operation. Let's examine them:
* Zone is created/altered, but assignments are not yet saved. This is the case
when assignments timestamp is below latest zone's timestamp.
** For ALTER, returning an older timestamp from assignments is not a problem,
it'll eventually become more recent.
** For CREATE, we should probably determine if assignments are not yet saved,
and use the timestamp from catalog.
* A list of data nodes is updated, but assignments are not yet re-calculated
because of timeout.
** Current code uses "latest" catalog state when it transforms data nodes into
assignments, so it is safe to use timestamp of latest catalog version.
There's a Jira that aims to fix it:
https://issues.apache.org/jira/browse/IGNITE-22723. This means that current
approach might not work in the future.
Anyway, considering everything from the above, there's one situation that we
must keep in mind:
* DZ is updated at catalog version 15.
* Assignments are calculated for the same exact catalog version.
* Nothing changes for a long time. Several days, for example.
* During that time, catalog version increases and becomes 75, for example.
If nothing changes, we should be able to remove versions 15-74, because DZ
settings from versions 15 and 75 are identical.
It seems like the proposed algorithm works exactly like we need.
> Catalog compaction. Component to determine minimum catalog version required
> by rebalance.
> -----------------------------------------------------------------------------------------
>
> Key: IGNITE-23413
> URL: https://issues.apache.org/jira/browse/IGNITE-23413
> Project: Ignite
> Issue Type: Improvement
> Reporter: Pavel Pereslegin
> Assignee: Ivan Bessonov
> Priority: Major
> Labels: ignite-3
>
> Each rebalance procedure uses specific catalog version, it "holds" the
> timestamp corresponding to the latest (at the moment of rebalancing start)
> version of the catalog
> To be able safely perform catalog compaction, we need to design and implement
> a component that can determine the minimum version required for active
> rebalances (to avoid deleting this version during compaction).
> {code:java}
> interface RebalanceMinimumRequiredTimeProvider {
> /**
> * Returns the minimum time required for rebalance,
> * or current timestamp if there are no active
> * rebalances and there is a guarantee that all rebalances
> * launched in the future will use catalog version
> * corresponding to the current time or greater.
> */
> long minimumRequiredTime();
> }
> {code}
> The component can be either global or local (whichever is easier to
> implement). This means that the compaction procedure can call the component
> on all nodes in the cluster and calculate the minimum.
> The component must be able to track rebalances that may be triggered during
> "replay" of the metastorage raft log.
> The component should return only monotonically increasing values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)