Slight modification to previous version --> https://ibb.co/Nttx3J1
--- Mallikarjun On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun <[email protected]> wrote: > Inline Reply > > On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey <[email protected]> wrote: > >> Hi Mallikarjun, >> >> Those goals sound worthwhile. >> >> Do you have a flow chart similar to the one you posted for the current >> system but for the proposed solution? >> > > This is what I am thinking --> https://ibb.co/KmH6Cwv > > >> >> How much will we need to change our existing test coverage to accommodate >> the proposed solution? >> > > Of the 38 tests, it looks like we might have to change a couple only. > Will have to add more tests to cover parallel backup scenarios. > > >> >> How much will we need to update the existing reference guide section? >> >> > Probably nothing. Interface as such will not change. > > >> >> On Sun, Jan 31, 2021, 04:59 Mallikarjun <[email protected]> wrote: >> >> > Bringing up this thread. >> > >> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani <[email protected]> wrote: >> > >> > > Thanks, the image is visible now. >> > > >> > > > Since I wanted to open this for discussion, did not consider >> placing it >> > > in >> > > *hbase/dev_support/design-docs*. >> > > >> > > Definitely, only after we come to concrete conclusion with the >> reviewer, >> > we >> > > should open up a PR. Until then this thread is anyways up for >> discussion. >> > > >> > > >> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <[email protected] >> > >> > > wrote: >> > > >> > > > Hope this link works --> https://ibb.co/hYjRpgP >> > > > >> > > > Inline reply >> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <[email protected]> >> > wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > Still not available :) >> > > > > The attachments don’t work on mailing lists. You can try uploading >> > the >> > > > > attachment on some public hosting site and provide the url to the >> > same >> > > > > here. >> > > > > >> > > > > Since I am not aware of the contents, I cannot confirm right away >> but >> > > if >> > > > > the reviewer feels we should have the attachment on our github >> repo: >> > > > > hbase/dev-support/design-docs , good to upload the content there >> > later. >> > > > For >> > > > > instance, pdf file can contain existing design and new design >> > diagrams >> > > > and >> > > > > talk about pros and cons etc once we have things finalized. >> > > > > >> > > > > >> > > > Since I wanted to open this for discussion, did not consider >> placing it >> > > in >> > > > *hbase/dev_support/design-docs*. >> > > > >> > > > >> > > > > >> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < >> > [email protected] >> > > > >> > > > > wrote: >> > > > > >> > > > > > Attached as image. Please let me know if it is availabe now. >> > > > > > >> > > > > > --- >> > > > > > Mallikarjun >> > > > > > >> > > > > > >> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <[email protected] >> > >> > > > wrote: >> > > > > > >> > > > > >> Hi! >> > > > > >> >> > > > > >> Thanks for the write up. unfortunately, your image for the >> > existing >> > > > > >> design didn't come through. Could you post it to some host and >> > link >> > > it >> > > > > >> here? >> > > > > >> >> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < >> > > [email protected] >> > > > > >> > > > > >> wrote: >> > > > > >> > >> > > > > >> > Existing Design: >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > Problem 1: >> > > > > >> > >> > > > > >> > With this design, Incremental and Full backup can't be run in >> > > > parallel >> > > > > >> and leading to degraded RPO's in case Full backup is of longer >> > > > duration >> > > > > esp >> > > > > >> for large tables. >> > > > > >> > >> > > > > >> > Example: >> > > > > >> > Expectation: Say you have a big table with 10 TB and your >> RPO is >> > > 60 >> > > > > >> minutes and you are allowed to ship the remote backup with 800 >> > Mbps. >> > > > And >> > > > > >> you are allowed to take Full Backups once in a week and rest of >> > them >> > > > > should >> > > > > >> be incremental backups >> > > > > >> > >> > > > > >> > Shortcoming: With the above design, one can't run parallel >> > backups >> > > > and >> > > > > >> whenever there is a full backup running (which takes roughly 25 >> > > hours) >> > > > > you >> > > > > >> are not allowed to take incremental backups and that would be a >> > > breach >> > > > > in >> > > > > >> your RPO. >> > > > > >> > >> > > > > >> > Proposed Solution: Barring some critical sections such as >> > > modifying >> > > > > >> state of the backup on meta tables, others can happen >> parallelly. >> > > > > Leaving >> > > > > >> incremental backups to be able to run based on older successful >> > > full / >> > > > > >> incremental backups and completion time of backup should be >> used >> > > > > instead of >> > > > > >> start time of backup for ordering. I have not worked on the >> full >> > > > > redesign, >> > > > > >> and will be doing so if this proposal seems acceptable for the >> > > > > community. >> > > > > >> > >> > > > > >> > Problem 2: >> > > > > >> > >> > > > > >> > With one backup at a time, it fails easily for a multi-tenant >> > > > system. >> > > > > >> This poses following problems >> > > > > >> > >> > > > > >> > Admins will not be able to achieve required RPO's for their >> > tables >> > > > > >> because of dependence on other tenants present in the system. >> As >> > one >> > > > > tenant >> > > > > >> doesn't have control over other tenants' table sizes and hence >> the >> > > > > duration >> > > > > >> of the backup >> > > > > >> > Management overhead of setting up a right sequence to achieve >> > > > required >> > > > > >> RPO's for different tenants could be very hard. >> > > > > >> > >> > > > > >> > Proposed Solution: Same as previous proposal >> > > > > >> > >> > > > > >> > Problem 3: >> > > > > >> > >> > > > > >> > Incremental backup works on WAL's and >> > > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures >> > that >> > > > > WAL's >> > > > > >> are never cleaned up until the next backup (Full / >> Incremental) is >> > > > > taken. >> > > > > >> This poses following problem >> > > > > >> > >> > > > > >> > WAL's can grow unbounded in case there are transient problems >> > like >> > > > > >> backup site facing issues or anything else until next backup >> > > scheduled >> > > > > goes >> > > > > >> successful >> > > > > >> > >> > > > > >> > Proposed Solution: I can't think of anything better, but I >> see >> > > this >> > > > > can >> > > > > >> be a potential problem. Also, one can force full backup if >> > required >> > > > WAL >> > > > > >> files are missing for whatever other reasons not necessarily >> > > mentioned >> > > > > >> above. >> > > > > >> > >> > > > > >> > --- >> > > > > >> > Mallikarjun >> > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> >
