[
https://issues.apache.org/jira/browse/IGNITE-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ilya Lantukh updated IGNITE-8020:
---------------------------------
Description:
Existing rebalancing protocol is suitable for in-memory data storage, but for
data persisted in files it is sub-optimal and requires a lot of unnecessary
steps. Efforts to optimize it led to necessity to completely rework the
protocol - instead of sending batches (SupplyMessages) with cache entries it is
possible to send data files directly.
The algorithm should look like this:
1. Demander node sends requests with required partition IDs (like now)
2. Supplier node receives request and performs a checkpoint.
3. After checkpoint is done, supplier sends files with demanded partitions
using low-level NIO API.
4. During steps 2-3, demander node should work in special mode - it should
temporary store all incoming updates in such way that they can be quickly
applied later.
5. After files are transferred, demander applies updates stored at step 4.
The tricky part here is to switch work modes of demander node avoiding all
possible race conditions. Also, the aforementioned algorithm should be extended
to transfer or rebuild query indexes.
> Rebalancing for persistent caches should transfer file store over network
> instead of using existing supply/demand protocol
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-8020
> URL: https://issues.apache.org/jira/browse/IGNITE-8020
> Project: Ignite
> Issue Type: Improvement
> Components: persistence
> Reporter: Ilya Lantukh
> Assignee: Ilya Lantukh
> Priority: Major
>
> Existing rebalancing protocol is suitable for in-memory data storage, but for
> data persisted in files it is sub-optimal and requires a lot of unnecessary
> steps. Efforts to optimize it led to necessity to completely rework the
> protocol - instead of sending batches (SupplyMessages) with cache entries it
> is possible to send data files directly.
> The algorithm should look like this:
> 1. Demander node sends requests with required partition IDs (like now)
> 2. Supplier node receives request and performs a checkpoint.
> 3. After checkpoint is done, supplier sends files with demanded partitions
> using low-level NIO API.
> 4. During steps 2-3, demander node should work in special mode - it should
> temporary store all incoming updates in such way that they can be quickly
> applied later.
> 5. After files are transferred, demander applies updates stored at step 4.
> The tricky part here is to switch work modes of demander node avoiding all
> possible race conditions. Also, the aforementioned algorithm should be
> extended to transfer or rebuild query indexes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)