Potential adopters will absolutely want to construct for themselves a verifiable live exercise. Tooling that lets you do that against a snapshot would be the way to go, I think. Once you do that exercise, probably a few times, you can trust the backup solution enough for restore into production, where verification may or may not be possible.
A user who claims they'd rather not verify their backup solution works on account of performance concerns shouldn't be taken seriously. (Not that you would (smile)) > On Nov 1, 2017, at 7:55 PM, Josh Elser <[email protected]> wrote: > > > >> On 11/1/17 8:22 PM, Sean Busbey wrote: >> On Wed, Nov 1, 2017 at 7:08 PM, Vladimir Rodionov >> <[email protected]> wrote: >>> There is no way to validate correctness of backup in a general case. >>> >>> You can restore backup into temp table, but then what? Read rows one-by-one >>> from temp table and look them up >>> in a primary table? Won't work, because rows can be deleted or modified >>> since the last backup was done. >>> >> This is why we have snapshots, no? > > True, we could try to take a snapshot exactly when the backup was taken > (likely, still difficult to coordinate on an active system), but in what > reality would we actually want to do this? Most users I see are so concerned > about the cost of running compactions (which are actually making performance > better!), they wouldn't take non-negligible portion of their computing power > and available space to re-instantiate their data (at least once) to make sure > a copy worked correctly. > > We have WALs, HFiles, and some metadata we'd export in a backup right? Why > not intrinsically perform some validation that things like headers, trailers, > etc still exist on the files we exported (e.g. open file, read header, seek > to end, verify trailer, etc). I feel like that's a much more tenable solution > that isn't going to have a ridiculous burden like restoring tables of modest > and above size. > > This smells like it's really asking to verify a distcp, than verifying > backups. There is certainly something we can do to give a reasonable level of > confidence that doesn't involve reconstituting the whole thing.
