[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups

Mallikarjun (Jira) Sun, 29 Aug 2021 06:00:05 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mallikarjun updated HBASE-25784:
--------------------------------
    Attachment: proposed_design.png

> Support for Parallel Backups enabling multi tenancy with rsgroups
> -----------------------------------------------------------------
>
>                 Key: HBASE-25784
>                 URL: https://issues.apache.org/jira/browse/HBASE-25784
>             Project: HBase
>          Issue Type: Umbrella
>          Components: backup&amp;restore
>            Reporter: Mallikarjun
>            Assignee: Mallikarjun
>            Priority: Major
>              Labels: backup
>         Attachments: existing_design.png, proposed_design.png
>
>
> *Existing Design*
>  
> *Problem 1:* 
>  With this design, Incremental and Full backup can't be run in parallel and 
> leading to degraded RPO's in case Full backup is of longer duration esp for 
> large tables.
>   
>  Example: 
>  Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes 
> and you are allowed to ship the remote backup with 800 Mbps. And you are 
> allowed to take Full Backups once in a week and rest of them should be 
> incremental backups
>   
>  Shortcoming: With the above design, one can't run parallel backups and 
> whenever there is a full backup running (which takes roughly 25 hours) you 
> are not allowed to take incremental backups and that would be a breach in 
> your RPO. 
>   
>  *Proposed Solution:* Barring some critical sections such as modifying state 
> of the backup on meta tables, others can happen parallelly. Leaving 
> incremental backups to be able to run based on older successful full / 
> incremental backups and completion time of backup should be used instead of 
> start time of backup for ordering. I have not worked on the full redesign, 
> and will be doing so if this proposal seems acceptable for the community.
>   
>  *Problem 2:*
>  With one backup at a time, it fails easily for a multi-tenant system. This 
> poses following problems
>  * Admins will not be able to achieve required RPO's for their tables because 
> of dependence on other tenants present in the system. As one tenant doesn't 
> have control over other tenants' table sizes and hence the duration of the 
> backup
>  * Management overhead of setting up a right sequence to achieve required 
> RPO's for different tenants could be very hard.
> *Proposed Solution:* Same as previous proposal
>   
>  *Problem 3:* 
>  Incremental backup works on WAL's and 
> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are 
> never cleaned up until the next backup (Full / Incremental) is taken. This 
> poses following problem
>  * WAL's can grow unbounded in case there are transient problems like backup 
> site facing issues or anything else until next backup scheduled goes 
> successful
>  *Proposed Solution:* I can't think of anything better, but I see this can be 
> a potential problem. Also, one can force full backup if required WAL files 
> are missing for whatever other reasons not necessarily mentioned above. 
>   
> *Proposed Design.*
> !image-2021-06-03-16-34-34-957.png|width=324,height=416!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-25784) Support for Parallel Backups enabling multi tenancy with rsgroups

Reply via email to