[ 
https://issues.apache.org/jira/browse/IGNITE-28395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-28395:
----------------------------------
    Description: 
Lease Updater fires invoke to Meta storage asynchronously every 500ms without 
waiting for the previous one to complete. This causes multiple concurrent 
invocations with the same expected lease state — only one wins the CAS, the 
rest fail with `Lease update invocation failed because of outdated lease data 
on this node`. As a side effect, roughly once per minute the lease expires 
before renewal.

Simply reading fresh data from storage before each invoke does not help: 
previous invocations are already in-flight and will complete after the read, 
making the freshly-read state outdated by the time the new invoke reaches 
storage.

*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not 
complete — block with `future.get(timeout)` before reading from lease tracker 
and firing the next invoke. This guarantees at most one in-flight invoke at any 
time and that the lease state is read only after the previous update has 
landed. Timeout should be well below lease duration to guarantee renewal even 
under degraded network.

  was:
Lease Updater fires invoke to Meta Storage asynchronously every 500ms without 
waiting for the previous one to complete. This causes multiple concurrent 
invocations with the same expected lease state — only one wins the CAS, the 
rest fail with `Lease update invocation failed because of outdated lease data 
on this node`. As a side effect, roughly once per minute the lease expires 
before renewal.

Simply reading fresh data from storage before each invoke does not help: 
previous invocations are already in-flight and will complete after the read, 
making the freshly-read state outdated by the time the new invoke reaches 
storage.

*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not 
complete — block with `future.get(timeout)` before reading from lease tracker 
and firing the next invoke. This guarantees at most one in-flight invoke at any 
time and that the lease state is read only after the previous update has 
landed. Timeout should be well below lease duration to guarantee renewal even 
under degraded network.


> Lease Updater accumulates concurrent in-flight invocations causing constant 
> CAS failures
> ----------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28395
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28395
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> Lease Updater fires invoke to Meta storage asynchronously every 500ms without 
> waiting for the previous one to complete. This causes multiple concurrent 
> invocations with the same expected lease state — only one wins the CAS, the 
> rest fail with `Lease update invocation failed because of outdated lease data 
> on this node`. As a side effect, roughly once per minute the lease expires 
> before renewal.
> Simply reading fresh data from storage before each invoke does not help: 
> previous invocations are already in-flight and will complete after the read, 
> making the freshly-read state outdated by the time the new invoke reaches 
> storage.
> *Fix*
> Track in-flight invoke as a future. On each tick, if the previous future is 
> not complete — block with `future.get(timeout)` before reading from lease 
> tracker and firing the next invoke. This guarantees at most one in-flight 
> invoke at any time and that the lease state is read only after the previous 
> update has landed. Timeout should be well below lease duration to guarantee 
> renewal even under degraded network.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to