[ 
https://issues.apache.org/jira/browse/IGNITE-12938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100788#comment-17100788
 ] 

Alexey Scherbakov commented on IGNITE-12938:
--------------------------------------------

[~zstan]

I've left comments in PR.

The main problem of the fix it is actually still possible to get incorrect 
check on non idle grid falsely reported as idle.
Currently we compare counters (actually low watermarks for tracking counters) 
before and after partition processing to understand if some update were done.
But with comparison only gives us an understanding the low watermark doesn't 
change in between.
This is not working because:

1. Comparison initial counters are not synchronized between partitions. The 
comparison result can be different because of it.
2. For tracking counters it's possible to have out-of-order updates, which are 
currently ignored.

I think there is a way to make this working reliably.

We should add a starting counter value to PartitionKey and compare on reduce 
side. This will fix 1.
Out-of-order updates must be taken into account for before and after check. 
This will fix 2.







> control.sh utility commands: IdleVerify and ValidateIndexes use eventual 
> payload check.
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12938
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12938
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 2.8
>            Reporter: Stanilovsky Evgeny
>            Assignee: Stanilovsky Evgeny
>            Priority: Major
>             Fix For: 2.9
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> "--cache idle_verify" and "--cache validate_indexes" commands of *control.sh* 
>  utility use eventual payload check during  execution. This can lead to 
> execution concurrently with active payload and no errors like : "Checkpoint 
> with dirty pages started! Cluster not idle"  will be triggered. Additionally 
> current functional miss check on caches without persistence.  Remove old 
> functionality from PageMemory and move it into update counters usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to