Hi Kudu dev community,

A few of us met offline to discuss what it would take to add support for
backups to Kudu. The result of those discussions is a document describing a
proposal for an "MVP" implementation of backups, which would be the
simplest possible thing that meets what we think are the most important
requirements for a backup and restore subsystem.

Those of us that contributed to the document would like to solicit feedback
from the community on the approach and whether it would meet the needs of
others in the community.

High level summary:

   - Support both full and incremental backup of individual tables using
   the equivalent of snapshot scans (with enhancements for incremental backup
   support).
   - Write backups to HDFS, S3, or a local directory (presumably
   NFS-mounted).
   - Support restoring from a full backup plus incrementals using the
   equivalent of writes to a new table.
   - Use Spark to parallelize the backup and restore processes.
   - Not in scope at this time: Snapshots.

For more details, please see the document below:

https://docs.google.com/document/d/1j8eAaqQskCQKza6ejYI3WG3p4cn40
TN7ceC6Lyu27Rg/

Please feel free to leave comments on the Google Doc or respond to this
email thread to discuss.

My hope is to get started on implementing this design very soon. See
milestones for a rough outline of how we imagine this coming together over
time.

Thanks in advance for any feedback,

Mike

Reply via email to