Dan Smith created GEODE-2654:
--------------------------------

             Summary: Backups can capture different members from different 
points in time
                 Key: GEODE-2654
                 URL: https://issues.apache.org/jira/browse/GEODE-2654
             Project: Geode
          Issue Type: Bug
          Components: persistence
            Reporter: Dan Smith


Geode backups should behave the same as recovering from disk after killing all 
of the members.

Unfortunately, the backups instead can backup data on different members at 
different points in time, resulting in application level inconsistency. Here's 
an example of what goes wrong:

# Do a put in region A
# Do a put in region B
# Backup the system
# Recover from the backup
# You may see the put to region B, but not A, even if the data is colocated.

We ran into this with with lucene indexes - see GEODE-2643. We've worked around 
GEODE-2643 by putting all data into the same region, but we're worried that we 
still have a problem with the async event queue. With an async event listener 
that writes to another geode region, because it's possible to recover different 
points in time for the async event queue and the region, resulting in missed 
events. 

The issue is that there is no locking or other mechanism to prevent different 
members from backing up their data at different points in time. Colocating data 
does not avoid this problem, because when we recover from disk we may recover 
region A's bucket from one member and region B's bucket from another member.

The backup operation does have a mechanism for making sure that it gets a point 
in time snapshot of *metadata*. It sends a PrepareBackupRequest to all members 
which causes them to lock their init file. Then it sends a FinishBackupRequest 
which tells all members to backup their data and release the lock. This ensures 
that a backup doesn't completely miss a bucket or get corrupt metadata about 
what members host as bucket. See the comments in 
DiskStoreImpl.lockStoreBeforeBackup.

We should extend this Prepare/Finish mechanism to make sure we get a point in 
time snapshot of region data as well. One way to do this would be to get a lock 
on the *oplog* in lockStoreBeforeBackup to prevent writes and hold it until 
releaseBackupLock is called.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to