Hi Kudu dev community, A few of us met offline to discuss what it would take to add support for backups to Kudu. The result of those discussions is a document describing a proposal for an "MVP" implementation of backups, which would be the simplest possible thing that meets what we think are the most important requirements for a backup and restore subsystem.
Those of us that contributed to the document would like to solicit feedback from the community on the approach and whether it would meet the needs of others in the community. High level summary: - Support both full and incremental backup of individual tables using the equivalent of snapshot scans (with enhancements for incremental backup support). - Write backups to HDFS, S3, or a local directory (presumably NFS-mounted). - Support restoring from a full backup plus incrementals using the equivalent of writes to a new table. - Use Spark to parallelize the backup and restore processes. - Not in scope at this time: Snapshots. For more details, please see the document below: https://docs.google.com/document/d/1j8eAaqQskCQKza6ejYI3WG3p4cn40 TN7ceC6Lyu27Rg/ Please feel free to leave comments on the Google Doc or respond to this email thread to discuss. My hope is to get started on implementing this design very soon. See milestones for a rough outline of how we imagine this coming together over time. Thanks in advance for any feedback, Mike
