Hello,

> There is no way to validate correctness of backup in a general case.
If there is no way to validate correctness, can we trust them ?

> I am waiting for response from feature requester on what they expect from
verification.

A client using hbase backups should be able to justify its customers that
their data is safe and correct in case something goes wrong.
Backups are critical, in critical situations if we figure out that we are
not able to restore the backed up data then those backups are not useful.

The ask probably is, there should be some way to know that backed up data
is restorable (and is not corrupt) and there should be some way to know
that the backed up data is correct even partial verification should be good
enough, if not full.

I have added more details to HBASE-19104
<https://issues.apache.org/jira/browse/HBASE-19104>, HBASE-19105
<https://issues.apache.org/jira/browse/HBASE-19105> and HBASE-19106
<https://issues.apache.org/jira/browse/HBASE-19106> earlier today on the
questions asked there.

Thanks,
Amit Kabra.

On Thu, Nov 2, 2017 at 11:03 PM, Vladimir Rodionov <[email protected]>
wrote:

> On doc,
>
> We have great doc attached to HBASE-7912 (unfortunately, it is a little bit
> obsolete now)
>
> On Thu, Nov 2, 2017 at 10:31 AM, Vladimir Rodionov <[email protected]
> >
> wrote:
>
> > >>To be clear, I wasn't listing requirements. I was having trouble with
> > the
> > >>absolute "There is no way to validate correctness of backup in a
> general
> > >>case."
> >
> > I am waiting for response from feature requester on what they expect from
> > verification.
> > Until then, I would rephrase my statement: "I do not see how we can
> > perform correct verification ..."
> >
> > On Thu, Nov 2, 2017 at 9:20 AM, Stack <[email protected]> wrote:
> >
> >> On Thu, Nov 2, 2017 at 5:51 AM, Josh Elser <[email protected]> wrote:
> >>
> >> > On 11/1/17 11:33 PM, Stack wrote:
> >> >
> >> >> On Wed, Nov 1, 2017 at 5:08 PM, Vladimir Rodionov<
> >> [email protected]>
> >> >> wrote:
> >> >>
> >> >> There is no way to validate correctness of backup in a general case.
> >> >>>
> >> >>> You can restore backup into temp table, but then what? Read rows
> >> >>> one-by-one
> >> >>> from temp table and look them up
> >> >>>
> >> >>
> >> >>
> >> >> in a primary table? Won't work, because rows can be deleted or
> modified
> >> >>> since the last backup was done.
> >> >>>
> >> >>>
> >> >>> Replication has a verity table tool.
> >> >>
> >> >> You can ask a cluster not delete rows.
> >> >>
> >> >> You can read at a specific timestamp.
> >> >>
> >> >> Or you could create backups during an extended ITBLL. When ITBLL
> >> >> completes,
> >> >> verify it on src cluster. Create a table from the increment backups.
> >> >> Verify
> >> >> in the restore.
> >> >>
> >> >> Etc.
> >> >>
> >> >> St.Ack
> >> >>
> >> >
> >> > I can definitely see a benefit of a tool which verifies the data
> >> collected
> >> > for a backup which:
> >> >
> >> > 1. Is batch in nature
> >> > 2. Is ad-hoc (not intrinsically run for every backup session)
> >> > 3. Relies/is-built on existing tooling (snapshots or other
> >> > verification-like code)
> >> >
> >> > Thanks Stack. I think this is some good teasing of requirements from
> an
> >> > otherwise very broad and untenable problem statement that we started
> >> with
> >> > (which lead to the knee-jerk).
> >> >
> >>
> >> To be clear, I wasn't listing requirements. I was having trouble with
> the
> >> absolute "There is no way to validate correctness of backup in a general
> >> case." which is then seemingly being used to beat down any request for
> >> verification tooling/testing that shows backup/restore works properly.
> >> Good on you Josh,
> >> S
> >>
> >
> >
>

Reply via email to