Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Josh Elser Wed, 08 Nov 2017 10:35:47 -0800

On 11/8/17 1:26 PM, Andrew Purtell wrote:

I won't speak to the timing aspects of this, that's up to the RM, but the
testing details look reasonable to me.


Understood and agree. Thanks for your input!

 With respect to chaos testing, the

following goals would be good:

- Some backups and restores succeed even with masters and RSes going up and
down. The resiliency can always be improved later, but we can't rely on no
failures for entire duration of backup or restore operation to get a good
result, especially for restore.

Yup! The expectation (if not explicitly stated) would be that we wouldwork our way up to the ServerKilling monkey. The expectation is thatthis would be trivial to implement - IntegrationTestBase would wire itup for us.

- Backups are not corrupted by failures. Or, corrupted (partial?) backups
are identified and ignored and there are still good backups remaining which
can be used for restore.

- When the verification tool says a backup and restore are good, they
really are.


/me nods. Agreed.

I think we'll learn a bit about failure situations (doc intentionallyavoided defining problems/solution) and the problems we see will helpshape what the solutions we need to make are.

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Reply via email to