[ 
https://issues.apache.org/jira/browse/IGNITE-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244076#comment-15244076
 ] 

Anton Vinogradov commented on IGNITE-2864:
------------------------------------------

Partially fixed found on review.

Main problem is conflict resolving in case of local store usage.
This problem exists at current implementation too.
I think that this problem should be solved different from decided on review.

Env:
Each node has local store. 
Local store contains primary and backup partitions.
At node fail store can be used to restore entries.

Problem:
Cluster partially failed, amount of failed node > backups. 

Initial solution:
Restart failed nodes and load entries from local stores after restart.
Resolve conflicts at rebalancing and user requests.

Cons:
A lot of changes required. Difficult to cover all cases. 

Better solution:
Topology validator should be used to prevent work with inconsistent data.
Recover steps:
1) Deny all user requests (admins should do that)
2) Restart all failed nodes.
3) Load all data from all available local stores at stable topology (after 
final rebalancing finished). 
    All conflicts will be resovled using conflict resolver in this case, 
correct?
    All entries will be restored since we have backups at local stores. (in 
case lost stores <= backups)
4) Allow user requests.

Thoughts?

> Need update local store from primary and backups
> ------------------------------------------------
>
>                 Key: IGNITE-2864
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2864
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>            Reporter: Semen Boikov
>            Assignee: Anton Vinogradov
>             Fix For: 1.6
>
>
> Now cache local store is updated only from primary nodes, this means that 
> data can be lost if primary node is not re-started after crash. Need fix it 
> and update store from primaries and backups if store is local (for both tx 
> and atomic caches).
> This test should work:
> - cache with 1 backup, two server nodes
> - execute cache put for key K
> - stop both nodes
> - restart only node which was backup for K
> - load data from local sore, update for K should be restored



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to