Re: [DISCUSS] Improvements to backups

Nick Reich Thu, 10 Aug 2017 11:06:02 -0700

Dan, you are correct on #3: there is one location where this appears to not
be the case, but it is unused and thus timestamped directories is currently
implemented and overwrites should not be possible. This therefore also
covers incremental backups and negates the need for change #3. However,
what this means is that incremental backups need to know the timestamped
directory of the last backup. This suggests a different potential
optimization: keeping the (timestamped) incremental backup dirs in a base
directory and either using a metadata file or the timestamps from directory
names to determine the last incremental backup and automatically using that
as the baseline for the current backup (instead of having to (manually)
know what that directory was from the previous backup to use in the current
backup command)


On Thu, Aug 10, 2017 at 10:37 AM, Dan Smith <[email protected]> wrote:

> +1 this all looks good to me. I think #2 in particular would probably
> simplify the incremental backup code.
>
> For #3, I could have sworn the backups were already going into timestamped
> directories and nothing got overwritten in an existing backup. If that is
> not already happening that definitely should change!
>
> -Dan
>
> On Thu, Aug 10, 2017 at 10:31 AM, Nick Reich <[email protected]> wrote:
>
> > There is a desire to improve backup creation and restore. The suggested
> > improvements are listed below and I am seeking feedback from the
> community:
> >
> > 1) Allow saving of backups to different locations/systems: currently,
> > backups are saved to a directory on each member. Users can manually or
> > through scripting move those backups elsewhere, but it would be
> > advantageous to allow direct backups to cloud storage providers (amazon,
> > google, azure, etc.) and possibly other systems. To make this possible,
> it
> > is proposed to refactor backups into a service style architecture with
> > backup location plugins that can be used to specify the target location.
> > This would allow creation of additional backup strategies as demand is
> > determined and allow users to create their own plugins for their own
> > special use cases.
> >
> > 2) Changing backup restore procedure: backups create a restore script per
> > member that must be run from each member to restore a backup to. The
> script
> > created is based on the OS of the machine the backup is created on (it
> > mainly moves files to the correct directories). A more flexible system
> > would be to instead create a metadata file (xml, yaml, etc.) which
> contains
> > information on the files in the backup. This would allow the logic for
> > moving files and other activities in the backup restore process to be
> > maintained in our codebase in an operating system agnostic way. Because
> the
> > existing script is not dependent on geode code, old backups would not be
> > affected by this change, though the process for restoring new backups
> would
> > (likely using gfsh instead of sh or bat scripts).
> >
> > 3) Improved incremental backups: incremental backup allows for
> significant
> > space savings and is much quicker to run. However, it suffers from the
> > problem that you can only restore to the latest time the incremental
> backup
> > was run, as we overwrite user files, cache xml and properties, among
> other
> > files in the backup directory. By saving this information to timestamped
> > directories, restoring to a specific time point would be as simple as
> > choosing the newest point in the backup to include in the restore. Using
> > timestamped directories for normal backups as well would prevent
> successive
> > backups from overwriting each other.
> >
>

Re: [DISCUSS] Improvements to backups

Reply via email to