[jira] [Updated] (IGNITE-21881) Deal with retry send metastorage raft commands after a timeout

Alexander Lapin (Jira) Thu, 25 Apr 2024 02:09:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Lapin updated IGNITE-21881:
-------------------------------------
    Description: 
As a result of the analysis and reproduction of IGNITE-21142, it was found that 
the metastorage raft command can be re-sent if it does not time out, which may 
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout) 
and understand what to do next. I think we should use an infinite timeout.

As a result of the analysis and reproduction of IGNITE-21142, it was found that 
the metastorage raft command can be re-sent if it does not time out, which may 
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout) 
and understand what to do next. I think we should use an infinite timeout.
h3. Upd#1

As discussed, it's required to detect whether InvokeCommand was already 
processed on a server and resend original response if true instead of 
reprocessing. First of all it's not only about invoke but about all 
non-idempotent commands like getAndPut, getAndPutAll, getAndRemove, etc. Worth 
mentioning though that it relates only to MS and maybe CMG but not Partitions: 
within partitions, tx protocol along with returning result from indexes instead 
of returning result from raft, protects us from non-idempotent command 
processing.

All in all following solution is expected to be implemented:
 * New interface NonIdempotentCommand is introduced with an id field.
 * All MS non-idempotent commands like InvokeCommand, GetAndRemoveCommand, etc 
implement aforementioned interface NonIdempotentCommand.
 * On the client side, an identifier is added to the command. Two options are 
possible here:
 ** It's possible to set id to the the command on command creation. Easiest 
way, but it will required extra effort on the server side to track command 
time. In that case it's possible to use LongCounter + nodeId as an id. 
 ** Or it's possible to adjust command with an id within retry loop, in that 
case we may use id as a "command time", of course it also means that clock or 
System.currentTime<> should be used as id identifier. I strongly believe that 
first option is better for now. 
 * On the server side, precisely, within MS state machine new 
nonIdempotentCommandCache is introduced commandId -> (commandResult, 
commandStartTime)
 * On each NonIdempotentCommand following logic should be implemented:
 ** As an initial step it's required to check whether there's command with 
given id in the cache, if true just return cached result, without command 
reprocessing.
 ** If there's no given command in the cache, process it and populate the cache 
with the result.

Basically that's all. Both cache persistence and recovery on group restart and 
cache cleanup will be covered within separate tickets.

  was:
As a result of the analysis and reproduction of IGNITE-21142, it was found that 
the metastorage raft command can be re-sent if it does not time out, which may 
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout) 
and understand what to do next. I think we should use an infinite timeout.

As a result of the analysis and reproduction of IGNITE-21142, it was found that 
the metastorage raft command can be re-sent if it does not time out, which may 
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout) 
and understand what to do next. I think we should use an infinite timeout.
h3. Upd#1

As discussed, it's required to detect whether InvokeCommand was already 
processed on a server and resend original response if true instead of 
reprocessing. First of all it's not only about invoke but about all 
non-idempotent commands like getAndPut, getAndPutAll, getAndRemove, etc. Worth 
mentioning though that it relates only to MS and maybe CMG but not Partitions: 
within partitions, tx protocol along with returning result from indexes instead 
of returning result from raft, protects us from non-idempotent command 
processing.

All in all following solution is expected to be implemented.
 * New interface NonIdempotentCommand is introduced with an id field.
 * All MS non-idempotent commands like InvokeCommand, GetAndRemoveCommand, etc 
implement aforementioned interface NonIdempotentCommand.
 * On the client side, an identifier is added to the command. Two options are 
possible here:
 ** It's possible to set id to the the command on command creation. Easiest 
way, but it will required extra effort on the server side to track command 
time. In that case it's possible to use LongCounter + nodeId as an id. 
 ** Or it's possible to adjust command with an id within retry loop, in that 
case we may use id as a "command time", of course it also means that clock or 
System.currentTime<> should be used as id identifier. I strongly believe that 
first option is better for now. 
 * On the server side, precisely, within MS state machine new 
nonIdempotentCommandCache is introduced commandId -> (commandResult, 
commandStartTime)
 * On each NonIdempotentCommand following logic should be implemented:
 ** As an initial step it's required to check whether there's command with 
given id in the cache, if true just return cached result, without command 
reprocessing.
 ** If there's no given command in the cache, process it and populate the cache 
with the result.

Basically that's all. Both cache persistence and recovery on group restart and 
cache cleanup will be covered within separate tickets.


> Deal with retry send metastorage raft commands after a timeout
> --------------------------------------------------------------
>
>                 Key: IGNITE-21881
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21881
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> As a result of the analysis and reproduction of IGNITE-21142, it was found 
> that the metastorage raft command can be re-sent if it does not time out, 
> which may not be good and lead to hidden negative consequences, such as in 
> IGNITE-21142.
> Here we need to find out the reasons for this decision (with re-try by 
> timeout) and understand what to do next. I think we should use an infinite 
> timeout.
> As a result of the analysis and reproduction of IGNITE-21142, it was found 
> that the metastorage raft command can be re-sent if it does not time out, 
> which may not be good and lead to hidden negative consequences, such as in 
> IGNITE-21142.
> Here we need to find out the reasons for this decision (with re-try by 
> timeout) and understand what to do next. I think we should use an infinite 
> timeout.
> h3. Upd#1
> As discussed, it's required to detect whether InvokeCommand was already 
> processed on a server and resend original response if true instead of 
> reprocessing. First of all it's not only about invoke but about all 
> non-idempotent commands like getAndPut, getAndPutAll, getAndRemove, etc. 
> Worth mentioning though that it relates only to MS and maybe CMG but not 
> Partitions: within partitions, tx protocol along with returning result from 
> indexes instead of returning result from raft, protects us from 
> non-idempotent command processing.
> All in all following solution is expected to be implemented:
>  * New interface NonIdempotentCommand is introduced with an id field.
>  * All MS non-idempotent commands like InvokeCommand, GetAndRemoveCommand, 
> etc implement aforementioned interface NonIdempotentCommand.
>  * On the client side, an identifier is added to the command. Two options are 
> possible here:
>  ** It's possible to set id to the the command on command creation. Easiest 
> way, but it will required extra effort on the server side to track command 
> time. In that case it's possible to use LongCounter + nodeId as an id. 
>  ** Or it's possible to adjust command with an id within retry loop, in that 
> case we may use id as a "command time", of course it also means that clock or 
> System.currentTime<> should be used as id identifier. I strongly believe that 
> first option is better for now. 
>  * On the server side, precisely, within MS state machine new 
> nonIdempotentCommandCache is introduced commandId -> (commandResult, 
> commandStartTime)
>  * On each NonIdempotentCommand following logic should be implemented:
>  ** As an initial step it's required to check whether there's command with 
> given id in the cache, if true just return cached result, without command 
> reprocessing.
>  ** If there's no given command in the cache, process it and populate the 
> cache with the result.
> Basically that's all. Both cache persistence and recovery on group restart 
> and cache cleanup will be covered within separate tickets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21881) Deal with retry send metastorage raft commands after a timeout

Reply via email to