[
https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Tkalenko reassigned IGNITE-18595:
----------------------------------------
Assignee: Kirill Tkalenko
> Implement index build process during the full state transfer
> ------------------------------------------------------------
>
> Key: IGNITE-18595
> URL: https://issues.apache.org/jira/browse/IGNITE-18595
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
>
> Here there is no source of information for schema versions, associated with
> individual inserts. The core idea of the full rebalance is that all versions
> of all rows will be sent, while indexes will be rebuilt locally on the
> consumer. This is unfortunate. Why, you may ask.
> Imagine the following situation:
> * time T1: table A with index X is created
> * time T2: user uploads the data
> * time T3: user drops index X
> * time T4: “clean” node N enters topology and downloads data via full
> rebalance procedure
> * time T5: N becomes a leader and receives (already running) RO transactions
> with timestamp T2<T<T3
> Ideally, index X should be available for timestamp T. If the index is already
> available, it can’t suddenly become unavailable without an explicit rebuild
> request from the user (I guess).
> The LATEST schema version at the moment of rebalance must be known. That’s
> unavoidable and makes total sense. First idea that comes to mind is updating
> all Registered and Available indexes. Situation, when an index has more
> indexed rows than it requires, is correct. Scan queries only return indexed
> rows that match corresponding value in the partition MV store. The real
> problem would be having less data than required.
> The way that the approach is described in paragraph above is not quite
> correct. Let’s consider that there is a BinaryRow version. It defines a set
> of columns in the table at the moment of update. Not all row versions are
> compatible with all indexes. For example, you cannot put data into an index
> if a column has been deleted. On the other hand, you can put data in the
> index if a column has not yet been created (assuming it has a default value).
> In both cases the column is missing from the row version, but the outcome is
> very different.
> This fact has some implications. A set of indexes to be updated depends on
> the row version for every particular row. I propose calculating it as a set
> of all indexes from a {_}maximal continuous range of db schemas{_}, that (if
> not empty) starts with the earliest known schema and _all schemas in the
> range have all indexed columns_ existing in the table.
> For example, there’s a table T:
> |DB schema version|Table columns|
> |1|PK, A|
> |2|PK, A, B|
> |3 (LATEST)|PK, B|
>
> In such configuration, ranges would be:
> |Index columns|Schemas range|
> |A|[1 ... 2]|
> |B|[1 ... 3]|
> |A, B|[1 ... 2]|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)