Can someone review this pull request? https://github.com/apache/hbase/pull/3359
This change changes meta information for backup, if not part of hbase 3.0.0. It might have a lot of additional work to be put into executing the above mentioned plan. --- Mallikarjun On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun <[email protected]> wrote: > Slight modification to previous version --> https://ibb.co/Nttx3J1 > > --- > Mallikarjun > > > On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun <[email protected]> > wrote: > >> Inline Reply >> >> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey <[email protected]> wrote: >> >>> Hi Mallikarjun, >>> >>> Those goals sound worthwhile. >>> >>> Do you have a flow chart similar to the one you posted for the current >>> system but for the proposed solution? >>> >> >> This is what I am thinking --> https://ibb.co/KmH6Cwv >> >> >>> >>> How much will we need to change our existing test coverage to accommodate >>> the proposed solution? >>> >> >> Of the 38 tests, it looks like we might have to change a couple only. >> Will have to add more tests to cover parallel backup scenarios. >> >> >>> >>> How much will we need to update the existing reference guide section? >>> >>> >> Probably nothing. Interface as such will not change. >> >> >>> >>> On Sun, Jan 31, 2021, 04:59 Mallikarjun <[email protected]> >>> wrote: >>> >>> > Bringing up this thread. >>> > >>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani <[email protected]> wrote: >>> > >>> > > Thanks, the image is visible now. >>> > > >>> > > > Since I wanted to open this for discussion, did not consider >>> placing it >>> > > in >>> > > *hbase/dev_support/design-docs*. >>> > > >>> > > Definitely, only after we come to concrete conclusion with the >>> reviewer, >>> > we >>> > > should open up a PR. Until then this thread is anyways up for >>> discussion. >>> > > >>> > > >>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun < >>> [email protected]> >>> > > wrote: >>> > > >>> > > > Hope this link works --> https://ibb.co/hYjRpgP >>> > > > >>> > > > Inline reply >>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <[email protected]> >>> > wrote: >>> > > > >>> > > > > Hi, >>> > > > > >>> > > > > Still not available :) >>> > > > > The attachments don’t work on mailing lists. You can try >>> uploading >>> > the >>> > > > > attachment on some public hosting site and provide the url to the >>> > same >>> > > > > here. >>> > > > > >>> > > > > Since I am not aware of the contents, I cannot confirm right >>> away but >>> > > if >>> > > > > the reviewer feels we should have the attachment on our github >>> repo: >>> > > > > hbase/dev-support/design-docs , good to upload the content there >>> > later. >>> > > > For >>> > > > > instance, pdf file can contain existing design and new design >>> > diagrams >>> > > > and >>> > > > > talk about pros and cons etc once we have things finalized. >>> > > > > >>> > > > > >>> > > > Since I wanted to open this for discussion, did not consider >>> placing it >>> > > in >>> > > > *hbase/dev_support/design-docs*. >>> > > > >>> > > > >>> > > > > >>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun < >>> > [email protected] >>> > > > >>> > > > > wrote: >>> > > > > >>> > > > > > Attached as image. Please let me know if it is availabe now. >>> > > > > > >>> > > > > > --- >>> > > > > > Mallikarjun >>> > > > > > >>> > > > > > >>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey < >>> [email protected]> >>> > > > wrote: >>> > > > > > >>> > > > > >> Hi! >>> > > > > >> >>> > > > > >> Thanks for the write up. unfortunately, your image for the >>> > existing >>> > > > > >> design didn't come through. Could you post it to some host and >>> > link >>> > > it >>> > > > > >> here? >>> > > > > >> >>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun < >>> > > [email protected] >>> > > > > >>> > > > > >> wrote: >>> > > > > >> > >>> > > > > >> > Existing Design: >>> > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > > >> > Problem 1: >>> > > > > >> > >>> > > > > >> > With this design, Incremental and Full backup can't be run >>> in >>> > > > parallel >>> > > > > >> and leading to degraded RPO's in case Full backup is of longer >>> > > > duration >>> > > > > esp >>> > > > > >> for large tables. >>> > > > > >> > >>> > > > > >> > Example: >>> > > > > >> > Expectation: Say you have a big table with 10 TB and your >>> RPO is >>> > > 60 >>> > > > > >> minutes and you are allowed to ship the remote backup with 800 >>> > Mbps. >>> > > > And >>> > > > > >> you are allowed to take Full Backups once in a week and rest >>> of >>> > them >>> > > > > should >>> > > > > >> be incremental backups >>> > > > > >> > >>> > > > > >> > Shortcoming: With the above design, one can't run parallel >>> > backups >>> > > > and >>> > > > > >> whenever there is a full backup running (which takes roughly >>> 25 >>> > > hours) >>> > > > > you >>> > > > > >> are not allowed to take incremental backups and that would be >>> a >>> > > breach >>> > > > > in >>> > > > > >> your RPO. >>> > > > > >> > >>> > > > > >> > Proposed Solution: Barring some critical sections such as >>> > > modifying >>> > > > > >> state of the backup on meta tables, others can happen >>> parallelly. >>> > > > > Leaving >>> > > > > >> incremental backups to be able to run based on older >>> successful >>> > > full / >>> > > > > >> incremental backups and completion time of backup should be >>> used >>> > > > > instead of >>> > > > > >> start time of backup for ordering. I have not worked on the >>> full >>> > > > > redesign, >>> > > > > >> and will be doing so if this proposal seems acceptable for the >>> > > > > community. >>> > > > > >> > >>> > > > > >> > Problem 2: >>> > > > > >> > >>> > > > > >> > With one backup at a time, it fails easily for a >>> multi-tenant >>> > > > system. >>> > > > > >> This poses following problems >>> > > > > >> > >>> > > > > >> > Admins will not be able to achieve required RPO's for their >>> > tables >>> > > > > >> because of dependence on other tenants present in the system. >>> As >>> > one >>> > > > > tenant >>> > > > > >> doesn't have control over other tenants' table sizes and >>> hence the >>> > > > > duration >>> > > > > >> of the backup >>> > > > > >> > Management overhead of setting up a right sequence to >>> achieve >>> > > > required >>> > > > > >> RPO's for different tenants could be very hard. >>> > > > > >> > >>> > > > > >> > Proposed Solution: Same as previous proposal >>> > > > > >> > >>> > > > > >> > Problem 3: >>> > > > > >> > >>> > > > > >> > Incremental backup works on WAL's and >>> > > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures >>> > that >>> > > > > WAL's >>> > > > > >> are never cleaned up until the next backup (Full / >>> Incremental) is >>> > > > > taken. >>> > > > > >> This poses following problem >>> > > > > >> > >>> > > > > >> > WAL's can grow unbounded in case there are transient >>> problems >>> > like >>> > > > > >> backup site facing issues or anything else until next backup >>> > > scheduled >>> > > > > goes >>> > > > > >> successful >>> > > > > >> > >>> > > > > >> > Proposed Solution: I can't think of anything better, but I >>> see >>> > > this >>> > > > > can >>> > > > > >> be a potential problem. Also, one can force full backup if >>> > required >>> > > > WAL >>> > > > > >> files are missing for whatever other reasons not necessarily >>> > > mentioned >>> > > > > >> above. >>> > > > > >> > >>> > > > > >> > --- >>> > > > > >> > Mallikarjun >>> > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>
