Hi Mallikarjun, Those goals sound worthwhile.
Do you have a flow chart similar to the one you posted for the current system but for the proposed solution? How much will we need to change our existing test coverage to accommodate the proposed solution? How much will we need to update the existing reference guide section? On Sun, Jan 31, 2021, 04:59 Mallikarjun <[email protected]> wrote: > Bringing up this thread. > > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani <[email protected]> wrote: > > > Thanks, the image is visible now. > > > > > Since I wanted to open this for discussion, did not consider placing it > > in > > *hbase/dev_support/design-docs*. > > > > Definitely, only after we come to concrete conclusion with the reviewer, > we > > should open up a PR. Until then this thread is anyways up for discussion. > > > > > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <[email protected]> > > wrote: > > > > > Hope this link works --> https://ibb.co/hYjRpgP > > > > > > Inline reply > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <[email protected]> > wrote: > > > > > > > Hi, > > > > > > > > Still not available :) > > > > The attachments don’t work on mailing lists. You can try uploading > the > > > > attachment on some public hosting site and provide the url to the > same > > > > here. > > > > > > > > Since I am not aware of the contents, I cannot confirm right away but > > if > > > > the reviewer feels we should have the attachment on our github repo: > > > > hbase/dev-support/design-docs , good to upload the content there > later. > > > For > > > > instance, pdf file can contain existing design and new design > diagrams > > > and > > > > talk about pros and cons etc once we have things finalized. > > > > > > > > > > > Since I wanted to open this for discussion, did not consider placing it > > in > > > *hbase/dev_support/design-docs*. > > > > > > > > > > > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < > [email protected] > > > > > > > wrote: > > > > > > > > > Attached as image. Please let me know if it is availabe now. > > > > > > > > > > --- > > > > > Mallikarjun > > > > > > > > > > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <[email protected]> > > > wrote: > > > > > > > > > >> Hi! > > > > >> > > > > >> Thanks for the write up. unfortunately, your image for the > existing > > > > >> design didn't come through. Could you post it to some host and > link > > it > > > > >> here? > > > > >> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < > > [email protected] > > > > > > > > >> wrote: > > > > >> > > > > > >> > Existing Design: > > > > >> > > > > > >> > > > > > >> > > > > > >> > Problem 1: > > > > >> > > > > > >> > With this design, Incremental and Full backup can't be run in > > > parallel > > > > >> and leading to degraded RPO's in case Full backup is of longer > > > duration > > > > esp > > > > >> for large tables. > > > > >> > > > > > >> > Example: > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO is > > 60 > > > > >> minutes and you are allowed to ship the remote backup with 800 > Mbps. > > > And > > > > >> you are allowed to take Full Backups once in a week and rest of > them > > > > should > > > > >> be incremental backups > > > > >> > > > > > >> > Shortcoming: With the above design, one can't run parallel > backups > > > and > > > > >> whenever there is a full backup running (which takes roughly 25 > > hours) > > > > you > > > > >> are not allowed to take incremental backups and that would be a > > breach > > > > in > > > > >> your RPO. > > > > >> > > > > > >> > Proposed Solution: Barring some critical sections such as > > modifying > > > > >> state of the backup on meta tables, others can happen parallelly. > > > > Leaving > > > > >> incremental backups to be able to run based on older successful > > full / > > > > >> incremental backups and completion time of backup should be used > > > > instead of > > > > >> start time of backup for ordering. I have not worked on the full > > > > redesign, > > > > >> and will be doing so if this proposal seems acceptable for the > > > > community. > > > > >> > > > > > >> > Problem 2: > > > > >> > > > > > >> > With one backup at a time, it fails easily for a multi-tenant > > > system. > > > > >> This poses following problems > > > > >> > > > > > >> > Admins will not be able to achieve required RPO's for their > tables > > > > >> because of dependence on other tenants present in the system. As > one > > > > tenant > > > > >> doesn't have control over other tenants' table sizes and hence the > > > > duration > > > > >> of the backup > > > > >> > Management overhead of setting up a right sequence to achieve > > > required > > > > >> RPO's for different tenants could be very hard. > > > > >> > > > > > >> > Proposed Solution: Same as previous proposal > > > > >> > > > > > >> > Problem 3: > > > > >> > > > > > >> > Incremental backup works on WAL's and > > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures > that > > > > WAL's > > > > >> are never cleaned up until the next backup (Full / Incremental) is > > > > taken. > > > > >> This poses following problem > > > > >> > > > > > >> > WAL's can grow unbounded in case there are transient problems > like > > > > >> backup site facing issues or anything else until next backup > > scheduled > > > > goes > > > > >> successful > > > > >> > > > > > >> > Proposed Solution: I can't think of anything better, but I see > > this > > > > can > > > > >> be a potential problem. Also, one can force full backup if > required > > > WAL > > > > >> files are missing for whatever other reasons not necessarily > > mentioned > > > > >> above. > > > > >> > > > > > >> > --- > > > > >> > Mallikarjun > > > > >> > > > > > > > > > > > > > > >
