[ https://issues.apache.org/jira/browse/GEODE-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Darrel Schneider resolved GEODE-8338. ------------------------------------- Fix Version/s: 1.14.0 Resolution: Fixed > Redis commands may be repeated when server dies > ----------------------------------------------- > > Key: GEODE-8338 > URL: https://issues.apache.org/jira/browse/GEODE-8338 > Project: Geode > Issue Type: Bug > Components: redis > Reporter: Sarah Abbey > Assignee: Darrel Schneider > Priority: Major > Fix For: 1.14.0 > > > Since we have one redundant copy of the data, and since we modify the data > using a function, I think we may have a data corruption issue with > non-idempotent operations. What can happen is that an operation like APPEND > can: > 0) executor called on non-primary redis server, > 1) modify the primary (by sending a function exec to it), > 2) modify the secondary (by sending a geode delta to it), > 3) the primary server fails now (before the function executing on it > completes), > 4) the non-primary redis server sees the function fail and that it is marked > as HA so it retries it. This time it sends it the secondary, which is the new > primary, but the operation was actually done on the secondary so this retry > will end up doing the operation twice. > This may be okay for certain ops (like SADD) that are idempotent (but even > they could cause extra key events in the future), but for ops like APPEND we > end up appending twice. > This will only happen when a server executing a function dies and our > function service retries the function on another server because it is marked > HA. The easy way to fix this is to change our function to not be HA. This is > just a single one line change. > Note that our clients can already see exceptions/errors if the server they > are connected to dies. When that happens the operation they requested may > have happened, and if they have multiple geode redis servers running it may > have been stored and still in memory. So clients will need some logic to > decide if they should redo such an operation or not (because it is already > done). > *Note:* By making the function non-HA, it should just give the client another > case in which they need to handle a server crash. It can now be for servers > they were not connected to but that were involved in performing the operation > they requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)