[ 
https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153107#comment-17153107
 ] 

ASF GitHub Bot commented on GEODE-8338:
---------------------------------------

sabbeyPivotal opened a new pull request #5351:
URL: https://github.com/apache/geode/pull/5351


   Since we have one redundant copy of the data, and since we modify the data 
using a function, I think we may have a data corruption issue with 
non-idempotent operations. What can happen is that an operation like APPEND can:
   0) executor called on non-primary redis server,
   1) modify the primary (by sending a function exec to it),
   2) modify the secondary (by sending a geode delta to it),
   3) the primary server fails now (before the function executing on it 
completes),
   4) the non-primary redis server sees the function fail and that it is marked 
as HA so it retries it. This time it sends it the secondary, which is the new 
primary, but the operation was actually done on the secondary so this retry 
will end up doing the operation twice.
   
   This may be okay for certain ops (like SADD) that are idempotent (but even 
they could cause extra key events in the future), but for ops like APPEND we 
end up appending twice.
   
   This will only happen when a server executing a function dies and our 
function service retries the function on another server because it is marked 
HA. The easy way to fix this is to change our function to not be HA. This is 
just a single one line change.
   Note that our clients can already see exceptions/errors if the server they 
are connected to dies. When that happens the operation they requested may have 
happened, and if they have multiple geode redis servers running it may have 
been stored and still in memory. So clients will need some logic to decide if 
they should redo such an operation or not (because it is already done).
   
   Note: By making the function non-HA, it should just give the client another 
case in which they need to handle a server crash. It can now be for servers 
they were not connected to but that were involved in performing the operation 
they requested.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Redis commands may be repeated when server dies
> -----------------------------------------------
>
>                 Key: GEODE-8338
>                 URL: https://issues.apache.org/jira/browse/GEODE-8338
>             Project: Geode
>          Issue Type: Bug
>          Components: redis
>            Reporter: Sarah Abbey
>            Priority: Major
>
> Since we have one redundant copy of the data, and since we modify the data 
> using a function, I think we may have a data corruption issue with 
> non-idempotent operations. What can happen is that an operation like APPEND 
> can:
>  0) executor called on non-primary redis server, 
>  1) modify the primary (by sending a function exec to it), 
>  2) modify the secondary (by sending a geode delta to it), 
>  3) the primary server fails now (before the function executing on it 
> completes), 
>  4) the non-primary redis server sees the function fail and that it is marked 
> as HA so it retries it. This time it sends it the secondary, which is the new 
> primary, but the operation was actually done on the secondary so this retry 
> will end up doing the operation twice.
> This may be okay for certain ops (like SADD) that are idempotent (but even 
> they could cause extra key events in the future), but for ops like APPEND we 
> end up appending twice.
> This will only happen when a server executing a function dies and our 
> function service retries the function on another server because it is marked 
> HA. The easy way to fix this is to change our function to not be HA. This is 
> just a single one line change.
>  Note that our clients can already see exceptions/errors if the server they 
> are connected to dies. When that happens the operation they requested may 
> have happened, and if they have multiple geode redis servers running it may 
> have been stored and still in memory. So clients will need some logic to 
> decide if they should redo such an operation or not (because it is already 
> done).
> *Note:* By making the function non-HA, it should just give the client another 
> case in which they need to handle a server crash. It can now be for servers 
> they were not connected to but that were involved in performing the operation 
> they requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to