Slight modification to previous version --> https://ibb.co/Nttx3J1

---
Mallikarjun


On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun <[email protected]>
wrote:

> Inline Reply
>
> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey <[email protected]> wrote:
>
>> Hi Mallikarjun,
>>
>> Those goals sound worthwhile.
>>
>> Do you have a flow chart similar to the one you posted for the current
>> system but for the proposed solution?
>>
>
> This is what I am thinking --> https://ibb.co/KmH6Cwv
>
>
>>
>> How much will we need to change our existing test coverage to accommodate
>> the proposed solution?
>>
>
> Of the 38 tests, it looks like we might have to change a couple only.
> Will have to add more tests to cover parallel backup scenarios.
>
>
>>
>> How much will we need to update the existing reference guide section?
>>
>>
> Probably nothing. Interface as such will not change.
>
>
>>
>> On Sun, Jan 31, 2021, 04:59 Mallikarjun <[email protected]> wrote:
>>
>> > Bringing up this thread.
>> >
>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani <[email protected]> wrote:
>> >
>> > > Thanks, the image is visible now.
>> > >
>> > > > Since I wanted to open this for discussion, did not consider
>> placing it
>> > > in
>> > > *hbase/dev_support/design-docs*.
>> > >
>> > > Definitely, only after we come to concrete conclusion with the
>> reviewer,
>> > we
>> > > should open up a PR. Until then this thread is anyways up for
>> discussion.
>> > >
>> > >
>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Hope this link works --> https://ibb.co/hYjRpgP
>> > > >
>> > > > Inline reply
>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Still not available :)
>> > > > > The attachments don’t work on mailing lists. You can try uploading
>> > the
>> > > > > attachment on some public hosting site and provide the url to the
>> > same
>> > > > > here.
>> > > > >
>> > > > > Since I am not aware of the contents, I cannot confirm right away
>> but
>> > > if
>> > > > > the reviewer feels we should have the attachment on our github
>> repo:
>> > > > > hbase/dev-support/design-docs , good to upload the content there
>> > later.
>> > > > For
>> > > > > instance, pdf file can contain existing design and new design
>> > diagrams
>> > > > and
>> > > > > talk about pros and cons etc once we have things finalized.
>> > > > >
>> > > > >
>> > > > Since I wanted to open this for discussion, did not consider
>> placing it
>> > > in
>> > > > *hbase/dev_support/design-docs*.
>> > > >
>> > > >
>> > > > >
>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
>> > [email protected]
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Attached as image. Please let me know if it is availabe now.
>> > > > > >
>> > > > > > ---
>> > > > > > Mallikarjun
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <[email protected]
>> >
>> > > > wrote:
>> > > > > >
>> > > > > >> Hi!
>> > > > > >>
>> > > > > >> Thanks for the write up. unfortunately, your image for the
>> > existing
>> > > > > >> design didn't come through. Could you post it to some host and
>> > link
>> > > it
>> > > > > >> here?
>> > > > > >>
>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
>> > > [email protected]
>> > > > >
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > Existing Design:
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Problem 1:
>> > > > > >> >
>> > > > > >> > With this design, Incremental and Full backup can't be run in
>> > > > parallel
>> > > > > >> and leading to degraded RPO's in case Full backup is of longer
>> > > > duration
>> > > > > esp
>> > > > > >> for large tables.
>> > > > > >> >
>> > > > > >> > Example:
>> > > > > >> > Expectation: Say you have a big table with 10 TB and your
>> RPO is
>> > > 60
>> > > > > >> minutes and you are allowed to ship the remote backup with 800
>> > Mbps.
>> > > > And
>> > > > > >> you are allowed to take Full Backups once in a week and rest of
>> > them
>> > > > > should
>> > > > > >> be incremental backups
>> > > > > >> >
>> > > > > >> > Shortcoming: With the above design, one can't run parallel
>> > backups
>> > > > and
>> > > > > >> whenever there is a full backup running (which takes roughly 25
>> > > hours)
>> > > > > you
>> > > > > >> are not allowed to take incremental backups and that would be a
>> > > breach
>> > > > > in
>> > > > > >> your RPO.
>> > > > > >> >
>> > > > > >> > Proposed Solution: Barring some critical sections such as
>> > > modifying
>> > > > > >> state of the backup on meta tables, others can happen
>> parallelly.
>> > > > > Leaving
>> > > > > >> incremental backups to be able to run based on older successful
>> > > full /
>> > > > > >> incremental backups and completion time of backup should be
>> used
>> > > > > instead of
>> > > > > >> start time of backup for ordering. I have not worked on the
>> full
>> > > > > redesign,
>> > > > > >> and will be doing so if this proposal seems acceptable for the
>> > > > > community.
>> > > > > >> >
>> > > > > >> > Problem 2:
>> > > > > >> >
>> > > > > >> > With one backup at a time, it fails easily for a multi-tenant
>> > > > system.
>> > > > > >> This poses following problems
>> > > > > >> >
>> > > > > >> > Admins will not be able to achieve required RPO's for their
>> > tables
>> > > > > >> because of dependence on other tenants present in the system.
>> As
>> > one
>> > > > > tenant
>> > > > > >> doesn't have control over other tenants' table sizes and hence
>> the
>> > > > > duration
>> > > > > >> of the backup
>> > > > > >> > Management overhead of setting up a right sequence to achieve
>> > > > required
>> > > > > >> RPO's for different tenants could be very hard.
>> > > > > >> >
>> > > > > >> > Proposed Solution: Same as previous proposal
>> > > > > >> >
>> > > > > >> > Problem 3:
>> > > > > >> >
>> > > > > >> > Incremental backup works on WAL's and
>> > > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures
>> > that
>> > > > > WAL's
>> > > > > >> are never cleaned up until the next backup (Full /
>> Incremental) is
>> > > > > taken.
>> > > > > >> This poses following problem
>> > > > > >> >
>> > > > > >> > WAL's can grow unbounded in case there are transient problems
>> > like
>> > > > > >> backup site facing issues or anything else until next backup
>> > > scheduled
>> > > > > goes
>> > > > > >> successful
>> > > > > >> >
>> > > > > >> > Proposed Solution: I can't think of anything better, but I
>> see
>> > > this
>> > > > > can
>> > > > > >> be a potential problem. Also, one can force full backup if
>> > required
>> > > > WAL
>> > > > > >> files are missing for whatever other reasons not necessarily
>> > > mentioned
>> > > > > >> above.
>> > > > > >> >
>> > > > > >> > ---
>> > > > > >> > Mallikarjun
>> > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to