Re: [DISCUSSION] Items to purge from branch-2 before we cut hbase-2.0.0-beta1.

Josh Elser Wed, 01 Nov 2017 19:56:43 -0700


On 11/1/17 8:22 PM, Sean Busbey wrote:

On Wed, Nov 1, 2017 at 7:08 PM, Vladimir Rodionov
<[email protected]> wrote:

There is no way to validate correctness of backup in a general case.

You can restore backup into temp table, but then what? Read rows one-by-one
from temp table and look them up
in a primary table? Won't work, because rows can be deleted or modified
since the last backup was done.



This is why we have snapshots, no?

True, we could try to take a snapshot exactly when the backup was taken(likely, still difficult to coordinate on an active system), but in whatreality would we actually want to do this? Most users I see are soconcerned about the cost of running compactions (which are actuallymaking performance better!), they wouldn't take non-negligible portionof their computing power and available space to re-instantiate theirdata (at least once) to make sure a copy worked correctly.

We have WALs, HFiles, and some metadata we'd export in a backup right?Why not intrinsically perform some validation that things like headers,trailers, etc still exist on the files we exported (e.g. open file, readheader, seek to end, verify trailer, etc). I feel like that's a muchmore tenable solution that isn't going to have a ridiculous burden likerestoring tables of modest and above size.

This smells like it's really asking to verify a distcp, than verifyingbackups. There is certainly something we can do to give a reasonablelevel of confidence that doesn't involve reconstituting the whole thing.

Re: [DISCUSSION] Items to purge from branch-2 before we cut hbase-2.0.0-beta1.

Reply via email to