[jira] [Updated] (IGNITE-17084) Native rebalance for RocksDB partitions

Ivan Bessonov (Jira) Thu, 02 Jun 2022 09:16:08 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Bessonov updated IGNITE-17084:
-----------------------------------
    Description: 
General idea of full rebalance is described in 
https://issues.apache.org/jira/browse/IGNITE-17083

For persistent storages, there's an option to avoid copy-on-write rebalance 
algorithms if it's desired. Intuitively, it's a preferable option. Each storage 
chooses its own format.

In this case, RocksDB allows consistent db iteration using a "Snapshot" 
feature. Idea is very simple:
 * Take a RoackDB snapshot.
 * Iterate through partition data.
 * Iterate through indexes.
 * Relese the snapshot.

There must be a common "infrastructure" or a framework to stream native 
rebalance snapshots. Data format should be as simple as possible.

NOTE: of course, it has to be mentioned that this approach might lead to 
ineffective storage space usage. What I mean is that "previous" versions of 
values, in terms of RocksDB, must be stored on the device if they're visible 
from any of snapshots. It can be a problem in theory, but in practice full 
rebalance isn't expected to occur often, and event then we don't expect that 
users will rewrite the entire partition data in a span of a single rebalance.
h2. Possible problems

Given that "raw" data is sent, including sql indexes, all incompleted indexes 
will be sent incompleted. Maybe we should also send a build state for each 
index so that the receiving side could continue from the right place, not from 
the beginning.

This problem will be resolved in the future. Currently we don't have indexes 
implemented.

  was:
General idea of full rebalance is described in 
https://issues.apache.org/jira/browse/IGNITE-17083

For persistent storages, there's an option to avoid copy-on-write rebalance 
algorithms if it's desired. Intuitively, it's a preferable option. Each storage 
chooses its own format.

In this case, RocksDB allows consistent db iteration using a "Snapshot" 
feature. Idea is very simple:
 * Take a RoackDB snapshot.
 * Iterate through partition data.
 * Iterate through indexes.
 * Relese the snapshot.

There must be a common "infrastructure" or a framework to stream native 
rebalance snapshots. Data format should be as simple as possible.
h2. Possible problems

Given that "raw" data is sent, including sql indexes, all incompleted indexes 
will be sent incompleted. Maybe we should also send a build state for each 
index so that the receiving side could continue from the right place, not from 
the beginning.

This problem will be resolved in the future. Currently we don't have indexes 
implemented.


> Native rebalance for RocksDB partitions
> ---------------------------------------
>
>                 Key: IGNITE-17084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17084
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> General idea of full rebalance is described in 
> https://issues.apache.org/jira/browse/IGNITE-17083
> For persistent storages, there's an option to avoid copy-on-write rebalance 
> algorithms if it's desired. Intuitively, it's a preferable option. Each 
> storage chooses its own format.
> In this case, RocksDB allows consistent db iteration using a "Snapshot" 
> feature. Idea is very simple:
>  * Take a RoackDB snapshot.
>  * Iterate through partition data.
>  * Iterate through indexes.
>  * Relese the snapshot.
> There must be a common "infrastructure" or a framework to stream native 
> rebalance snapshots. Data format should be as simple as possible.
> NOTE: of course, it has to be mentioned that this approach might lead to 
> ineffective storage space usage. What I mean is that "previous" versions of 
> values, in terms of RocksDB, must be stored on the device if they're visible 
> from any of snapshots. It can be a problem in theory, but in practice full 
> rebalance isn't expected to occur often, and event then we don't expect that 
> users will rewrite the entire partition data in a span of a single rebalance.
> h2. Possible problems
> Given that "raw" data is sent, including sql indexes, all incompleted indexes 
> will be sent incompleted. Maybe we should also send a build state for each 
> index so that the receiving side could continue from the right place, not 
> from the beginning.
> This problem will be resolved in the future. Currently we don't have indexes 
> implemented.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (IGNITE-17084) Native rebalance for RocksDB partitions

Reply via email to