[
https://issues.apache.org/jira/browse/IGNITE-28395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-28395:
----------------------------------
Description:
Lease Updater fires invoke to Meta storage asynchronously every 500ms without
waiting for the previous one to complete. This causes multiple concurrent
invocations with the same expected lease state — only one wins the CAS, the
rest fail with
{code:java}
Lease update invocation failed because of outdated lease data on this node{code}
As a result, roughly once per minute the lease expires before renewal.
Simply reading fresh data from storage before each invoke does not help:
previous invocations are already in-flight and will complete after the read,
making the freshly-read state outdated by the time the new invoke reaches
storage.
*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not
complete — block with `future.get(timeout)` before reading from lease tracker
and firing the next invoke. This guarantees at most one in-flight invoke at any
time and that the lease state is read only after the previous update has
landed. Timeout should be around leaseInterval/2 - after that, the leases most
likely will expire anyway.
Also, there may be lag between future completion and lease map update in lease
tracker, so lease map still may be stale. We can return written leases from
successful invoke itself. In the case of invoke failure, the map from lease
tracker should be used.
was:
Lease Updater fires invoke to Meta storage asynchronously every 500ms without
waiting for the previous one to complete. This causes multiple concurrent
invocations with the same expected lease state — only one wins the CAS, the
rest fail with
{code:java}
Lease update invocation failed because of outdated lease data on this node{code}
As a result, roughly once per minute the lease expires before renewal.
Simply reading fresh data from storage before each invoke does not help:
previous invocations are already in-flight and will complete after the read,
making the freshly-read state outdated by the time the new invoke reaches
storage.
*Fix*
Track in-flight invoke as a future. On each tick, if the previous future is not
complete — block with `future.get(timeout)` before reading from lease tracker
and firing the next invoke. This guarantees at most one in-flight invoke at any
time and that the lease state is read only after the previous update has
landed. Timeout should be around leaseInterval/2 - after that, the leases most
likely will expire anyway.
> Lease updater accumulates concurrent in-flight invocations causing constant
> CAS failures
> ----------------------------------------------------------------------------------------
>
> Key: IGNITE-28395
> URL: https://issues.apache.org/jira/browse/IGNITE-28395
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Priority: Major
> Labels: ignite-3
>
> Lease Updater fires invoke to Meta storage asynchronously every 500ms without
> waiting for the previous one to complete. This causes multiple concurrent
> invocations with the same expected lease state — only one wins the CAS, the
> rest fail with
> {code:java}
> Lease update invocation failed because of outdated lease data on this
> node{code}
> As a result, roughly once per minute the lease expires before renewal.
> Simply reading fresh data from storage before each invoke does not help:
> previous invocations are already in-flight and will complete after the read,
> making the freshly-read state outdated by the time the new invoke reaches
> storage.
> *Fix*
> Track in-flight invoke as a future. On each tick, if the previous future is
> not complete — block with `future.get(timeout)` before reading from lease
> tracker and firing the next invoke. This guarantees at most one in-flight
> invoke at any time and that the lease state is read only after the previous
> update has landed. Timeout should be around leaseInterval/2 - after that, the
> leases most likely will expire anyway.
> Also, there may be lag between future completion and lease map update in
> lease tracker, so lease map still may be stale. We can return written leases
> from successful invoke itself. In the case of invoke failure, the map from
> lease tracker should be used.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)