Hi!

Thanks for the write up. unfortunately, your image for the existing
design didn't come through. Could you post it to some host and link it
here?

On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <mallik.v.ar...@gmail.com> wrote:
>
> Existing Design:
>
>
>
> Problem 1:
>
> With this design, Incremental and Full backup can't be run in parallel and 
> leading to degraded RPO's in case Full backup is of longer duration esp for 
> large tables.
>
> Example:
> Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes 
> and you are allowed to ship the remote backup with 800 Mbps. And you are 
> allowed to take Full Backups once in a week and rest of them should be 
> incremental backups
>
> Shortcoming: With the above design, one can't run parallel backups and 
> whenever there is a full backup running (which takes roughly 25 hours) you 
> are not allowed to take incremental backups and that would be a breach in 
> your RPO.
>
> Proposed Solution: Barring some critical sections such as modifying state of 
> the backup on meta tables, others can happen parallelly. Leaving incremental 
> backups to be able to run based on older successful full / incremental 
> backups and completion time of backup should be used instead of start time of 
> backup for ordering. I have not worked on the full redesign, and will be 
> doing so if this proposal seems acceptable for the community.
>
> Problem 2:
>
> With one backup at a time, it fails easily for a multi-tenant system. This 
> poses following problems
>
> Admins will not be able to achieve required RPO's for their tables because of 
> dependence on other tenants present in the system. As one tenant doesn't have 
> control over other tenants' table sizes and hence the duration of the backup
> Management overhead of setting up a right sequence to achieve required RPO's 
> for different tenants could be very hard.
>
> Proposed Solution: Same as previous proposal
>
> Problem 3:
>
> Incremental backup works on WAL's and 
> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are 
> never cleaned up until the next backup (Full / Incremental) is taken. This 
> poses following problem
>
> WAL's can grow unbounded in case there are transient problems like backup 
> site facing issues or anything else until next backup scheduled goes 
> successful
>
> Proposed Solution: I can't think of anything better, but I see this can be a 
> potential problem. Also, one can force full backup if required WAL files are 
> missing for whatever other reasons not necessarily mentioned above.
>
> ---
> Mallikarjun

Reply via email to