[ 
https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko reassigned IGNITE-18595:
----------------------------------------

    Assignee:     (was: Kirill Tkalenko)

> Implement index build process during the full state transfer
> ------------------------------------------------------------
>
>                 Key: IGNITE-18595
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18595
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> Here there is no source of information for schema versions, associated with 
> individual inserts. The core idea of the full rebalance is that all versions 
> of all rows will be sent, while indexes will be rebuilt locally on the 
> consumer. This is unfortunate. Why, you may ask.
> Imagine the following situation:
>  * time T1: table A with index X is created
>  * time T2: user uploads the data
>  * time T3: user drops index X
>  * time T4: “clean” node N enters topology and downloads data via full 
> rebalance procedure
>  * time T5: N becomes a leader and receives (already running) RO transactions 
> with timestamp T2<T<T3
> Ideally, index X should be available for timestamp T. If the index is already 
> available, it can’t suddenly become unavailable without an explicit rebuild 
> request from the user (I guess).
> The LATEST schema version at the moment of rebalance must be known. That’s 
> unavoidable and makes total sense. First idea that comes to mind is updating 
> all Registered and Available indexes. Situation, when an index has more 
> indexed rows than it requires, is correct. Scan queries only return indexed 
> rows that match corresponding value in the partition MV store. The real 
> problem would be having less data than required.
> The way that the approach is described in paragraph above is not quite 
> correct. Let’s consider that there is a BinaryRow version. It defines a set 
> of columns in the table at the moment of update. Not all row versions are 
> compatible with all indexes. For example, you cannot put data into an index 
> if a column has been deleted. On the other hand, you can put data in the 
> index if a column has not yet been created (assuming it has a default value). 
> In both cases the column is missing from the row version, but the outcome is 
> very different.
> This fact has some implications. A set of indexes to be updated depends on 
> the row version for every particular row. I propose calculating it as a set 
> of all indexes from a {_}maximal continuous range of db schemas{_}, that (if 
> not empty) starts with the earliest known schema and _all schemas in the 
> range have all indexed columns_ existing in the table.
> For example, there’s a table T:
> |DB schema version|Table columns|
> |1|PK, A|
> |2|PK, A, B|
> |3 (LATEST)|PK, B|
>  
> In such configuration, ranges would be:
> |Index columns|Schemas range|
> |A|[1 ... 2]|
> |B|[1 ... 3]|
> |A, B|[1 ... 2]|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to