[jira] [Updated] (IGNITE-27109) IgniteCache#putAll may silently lose entries while any node is leaving the cluster

Mikhail Petrov (Jira) Wed, 21 Jan 2026 02:33:24 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mikhail Petrov updated IGNITE-27109:
------------------------------------
    Description: 
IgniteCache#putAll call may succeed, but some of the specified entries will not 
be stored in the cache. This may happen for ATOMIC caches when a node leaves 
the cluster during IgniteCache#putAll execution. Even though putAll can 
partially fail for atomic caches, user still should get 
CachePartialUpdateException.

The problem is reproduced by ReliabilityTest.testFailover test. Cache 
configuration: ATOMIC, REPLICATED, FULL_SYNC

See: 
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E

Explanation :



IgniteCache#putAll call may succeed, but some of the specified entries will not 
be stored in the cache. This may happen for ATOMIC caches when a node leaves 
the cluster during IgniteCache#putAll execution. Even though putAll can 
partially fail for atomic caches, user still should get 
CachePartialUpdateException.

The problem is reproduced by ReliabilityTest.testFailover test. Cache 
configuration: ATOMIC, REPLICATED, FULL_SYNC

See: 
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E

Explanation :

Consider cluster with 3 nodes - node0, node1, node2

1. node0 accepts putAll request, maps all keys to corresponding primary nodes 
and sends GridNearAtomicFullUpdateRequest to node1 and node2.
2. node1 starts processing cache entries. Halfway through this process node1 
receives stop signal (Ignite#close). All remaining attempts to process cache 
entries will fail with exception - see 
IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#invoke and 
IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#operationCancelledException.
3. node1 manages to sends GridDhtAtomicUpdateRequest with processed entries to 
backups - node2 and node0.
4. node1 fails to send GridNearAtomicUpdateResponse with failed keys to node0 
because NIO was stopped. This message is an indication to the "near" node that 
some keys could not be processed and the operation should be terminated with an 
exception.
5. node0 and node2 process entries from GridDhtAtomicUpdateRequest`s and sends 
GridDhtAtomicNearResponse`s to node0.
6. node1 is removed from the cluster.
7. node0 gets event that node1(primary node for some keys) left the cluster but 
it received GridDhtAtomicNearResponse`s from all backups. So node0 does nothing 
and eventually completes putAll operation.






  was:
IgniteCache#putAll call may succeed, but some of the specified entries will not 
be stored in the cache. This may happen for ATOMIC caches when a node leaves 
the cluster during IgniteCache#putAll execution. Even though putAll can 
partially fail for atomic caches, user still should get 
CachePartialUpdateException.

The problem is reproduced by ReliabilityTest.testFailover test. Cache 
configuration: ATOMIC, REPLICATED, FULL_SYNC

See: 
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E

Explanation :



IgniteCache#putAll call may succeed, but some of the specified entries will not 
be stored in the cache. This may happen for ATOMIC caches when a node leaves 
the cluster during IgniteCache#putAll execution. Even though putAll can 
partially fail for atomic caches, user still should get 
CachePartialUpdateException.

The problem is reproduced by ReliabilityTest.testFailover test. Cache 
configuration: ATOMIC, REPLICATED, FULL_SYNC

See: 
https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E

Explanation :

Consider cluster with 3 nodes - node0, node1, node2

1. node0 accepts putAll request, maps all keys to corresponding primary nodes 
and sends GridNearAtomicFullUpdateRequest to node1 and node2.
2. node1 starts processing cache entries. Halfway through this process node1 
receives stop signal (Ignite#close). All remaining attempts to process cache 
entries will fail with exception - see 
IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#invoke and 
IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#operationCancelledException.
3. node1 manages to sends GridDhtAtomicUpdateRequest with all processed entries 
to backups - node2 and node0.
4. node1 fails to send GridNearAtomicUpdateResponse with failed keys to node0 
because NIO was stopped. This message is an indication to the "near" node that 
some keys could not be processed and the operation should be terminated with an 
exception.
5. node0 and node2 process entries from GridDhtAtomicUpdateRequest`s and sends 
GridDhtAtomicNearResponse`s to node0.
6. node1 is removed from the cluster.
7. node0 gets event that node1(primary node for some keys) left the cluster but 
it received GridDhtAtomicNearResponse`s from all backups. So node0 does nothing 
and eventually completes putAll operation.







> IgniteCache#putAll may silently lose entries while any node is leaving the 
> cluster
> ----------------------------------------------------------------------------------
>
>                 Key: IGNITE-27109
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27109
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Petrov
>            Assignee: Mikhail Petrov
>            Priority: Major
>              Labels: ise
>
> IgniteCache#putAll call may succeed, but some of the specified entries will 
> not be stored in the cache. This may happen for ATOMIC caches when a node 
> leaves the cluster during IgniteCache#putAll execution. Even though putAll 
> can partially fail for atomic caches, user still should get 
> CachePartialUpdateException.
> The problem is reproduced by ReliabilityTest.testFailover test. Cache 
> configuration: ATOMIC, REPLICATED, FULL_SYNC
> See: 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E
> Explanation :
> IgniteCache#putAll call may succeed, but some of the specified entries will 
> not be stored in the cache. This may happen for ATOMIC caches when a node 
> leaves the cluster during IgniteCache#putAll execution. Even though putAll 
> can partially fail for atomic caches, user still should get 
> CachePartialUpdateException.
> The problem is reproduced by ReliabilityTest.testFailover test. Cache 
> configuration: ATOMIC, REPLICATED, FULL_SYNC
> See: 
> https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8360567487297938069&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E
> Explanation :
> Consider cluster with 3 nodes - node0, node1, node2
> 1. node0 accepts putAll request, maps all keys to corresponding primary nodes 
> and sends GridNearAtomicFullUpdateRequest to node1 and node2.
> 2. node1 starts processing cache entries. Halfway through this process node1 
> receives stop signal (Ignite#close). All remaining attempts to process cache 
> entries will fail with exception - see 
> IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#invoke and 
> IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#operationCancelledException.
> 3. node1 manages to sends GridDhtAtomicUpdateRequest with processed entries 
> to backups - node2 and node0.
> 4. node1 fails to send GridNearAtomicUpdateResponse with failed keys to node0 
> because NIO was stopped. This message is an indication to the "near" node 
> that some keys could not be processed and the operation should be terminated 
> with an exception.
> 5. node0 and node2 process entries from GridDhtAtomicUpdateRequest`s and 
> sends GridDhtAtomicNearResponse`s to node0.
> 6. node1 is removed from the cluster.
> 7. node0 gets event that node1(primary node for some keys) left the cluster 
> but it received GridDhtAtomicNearResponse`s from all backups. So node0 does 
> nothing and eventually completes putAll operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-27109) IgniteCache#putAll may silently lose entries while any node is leaving the cluster

Reply via email to