Alexey Goncharuk created IGNITE-5528:
----------------------------------------
Summary: IS_EVICT_DISABLED flag is not cleared when cache store
throws an exception
Key: IGNITE-5528
URL: https://issues.apache.org/jira/browse/IGNITE-5528
Project: Ignite
Issue Type: Bug
Components: cache
Affects Versions: 1.7
Reporter: Alexey Goncharuk
Fix For: 2.2
Below is an observation from a live system:
On a large cluster with occasional topology changes, there is a sporadic hang
which manifests itself with "Failed to evict partition message" for one of the
caches with enabled cache store. I managed to take a heap dump and found out
that on the hanging node there was a single entry with IS_EVICT_DISABLED flag
set and no other threads were doing store load operation. Earlier in the logs I
saw that the cache store threw a CacheLoaderException due to interrupted
connection with a database.
Currently, the flag is set before the cache store load and it is cleared after
the load.
Looks like if the store throws an exception, this leads to the leaked flag set
and the entry cannot be cleared from the partition. As a result, on the next
topology change partition exchange will be freezed with "Failed to wait for
partition eviction" error message.
Attached is the test reproducing this issue (note that the message appears
after one minute)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)