Inline Reply On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey <bus...@apache.org> wrote:
> Hi Mallikarjun, > > Those goals sound worthwhile. > > Do you have a flow chart similar to the one you posted for the current > system but for the proposed solution? > This is what I am thinking --> https://ibb.co/KmH6Cwv > > How much will we need to change our existing test coverage to accommodate > the proposed solution? > Of the 38 tests, it looks like we might have to change a couple only. Will have to add more tests to cover parallel backup scenarios. > > How much will we need to update the existing reference guide section? > > Probably nothing. Interface as such will not change. > > On Sun, Jan 31, 2021, 04:59 Mallikarjun <mallik.v.ar...@gmail.com> wrote: > > > Bringing up this thread. > > > > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani <vjas...@apache.org> wrote: > > > > > Thanks, the image is visible now. > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > *hbase/dev_support/design-docs*. > > > > > > Definitely, only after we come to concrete conclusion with the > reviewer, > > we > > > should open up a PR. Until then this thread is anyways up for > discussion. > > > > > > > > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <mallik.v.ar...@gmail.com> > > > wrote: > > > > > > > Hope this link works --> https://ibb.co/hYjRpgP > > > > > > > > Inline reply > > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <vjas...@apache.org> > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Still not available :) > > > > > The attachments don’t work on mailing lists. You can try uploading > > the > > > > > attachment on some public hosting site and provide the url to the > > same > > > > > here. > > > > > > > > > > Since I am not aware of the contents, I cannot confirm right away > but > > > if > > > > > the reviewer feels we should have the attachment on our github > repo: > > > > > hbase/dev-support/design-docs , good to upload the content there > > later. > > > > For > > > > > instance, pdf file can contain existing design and new design > > diagrams > > > > and > > > > > talk about pros and cons etc once we have things finalized. > > > > > > > > > > > > > > Since I wanted to open this for discussion, did not consider placing > it > > > in > > > > *hbase/dev_support/design-docs*. > > > > > > > > > > > > > > > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < > > mallik.v.ar...@gmail.com > > > > > > > > > wrote: > > > > > > > > > > > Attached as image. Please let me know if it is availabe now. > > > > > > > > > > > > --- > > > > > > Mallikarjun > > > > > > > > > > > > > > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <bus...@apache.org> > > > > wrote: > > > > > > > > > > > >> Hi! > > > > > >> > > > > > >> Thanks for the write up. unfortunately, your image for the > > existing > > > > > >> design didn't come through. Could you post it to some host and > > link > > > it > > > > > >> here? > > > > > >> > > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < > > > mallik.v.ar...@gmail.com > > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > Existing Design: > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > Problem 1: > > > > > >> > > > > > > >> > With this design, Incremental and Full backup can't be run in > > > > parallel > > > > > >> and leading to degraded RPO's in case Full backup is of longer > > > > duration > > > > > esp > > > > > >> for large tables. > > > > > >> > > > > > > >> > Example: > > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO > is > > > 60 > > > > > >> minutes and you are allowed to ship the remote backup with 800 > > Mbps. > > > > And > > > > > >> you are allowed to take Full Backups once in a week and rest of > > them > > > > > should > > > > > >> be incremental backups > > > > > >> > > > > > > >> > Shortcoming: With the above design, one can't run parallel > > backups > > > > and > > > > > >> whenever there is a full backup running (which takes roughly 25 > > > hours) > > > > > you > > > > > >> are not allowed to take incremental backups and that would be a > > > breach > > > > > in > > > > > >> your RPO. > > > > > >> > > > > > > >> > Proposed Solution: Barring some critical sections such as > > > modifying > > > > > >> state of the backup on meta tables, others can happen > parallelly. > > > > > Leaving > > > > > >> incremental backups to be able to run based on older successful > > > full / > > > > > >> incremental backups and completion time of backup should be used > > > > > instead of > > > > > >> start time of backup for ordering. I have not worked on the full > > > > > redesign, > > > > > >> and will be doing so if this proposal seems acceptable for the > > > > > community. > > > > > >> > > > > > > >> > Problem 2: > > > > > >> > > > > > > >> > With one backup at a time, it fails easily for a multi-tenant > > > > system. > > > > > >> This poses following problems > > > > > >> > > > > > > >> > Admins will not be able to achieve required RPO's for their > > tables > > > > > >> because of dependence on other tenants present in the system. As > > one > > > > > tenant > > > > > >> doesn't have control over other tenants' table sizes and hence > the > > > > > duration > > > > > >> of the backup > > > > > >> > Management overhead of setting up a right sequence to achieve > > > > required > > > > > >> RPO's for different tenants could be very hard. > > > > > >> > > > > > > >> > Proposed Solution: Same as previous proposal > > > > > >> > > > > > > >> > Problem 3: > > > > > >> > > > > > > >> > Incremental backup works on WAL's and > > > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures > > that > > > > > WAL's > > > > > >> are never cleaned up until the next backup (Full / Incremental) > is > > > > > taken. > > > > > >> This poses following problem > > > > > >> > > > > > > >> > WAL's can grow unbounded in case there are transient problems > > like > > > > > >> backup site facing issues or anything else until next backup > > > scheduled > > > > > goes > > > > > >> successful > > > > > >> > > > > > > >> > Proposed Solution: I can't think of anything better, but I see > > > this > > > > > can > > > > > >> be a potential problem. Also, one can force full backup if > > required > > > > WAL > > > > > >> files are missing for whatever other reasons not necessarily > > > mentioned > > > > > >> above. > > > > > >> > > > > > > >> > --- > > > > > >> > Mallikarjun > > > > > >> > > > > > > > > > > > > > > > > > > > > >