Re: [DISCUSS] Plan for Distributed testing of Backup and Restore

Ted Yu Tue, 12 Sep 2017 11:29:35 -0700

bq. we need a test tool similar to ITBLL

How about making the following such a tool ?


hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestBackupRestore.java

On Tue, Sep 12, 2017 at 11:25 AM, Vladimir Rodionov <[email protected]>
wrote:

> >> Vlad: I'm obviously curious to see what you think about this stuff, in
> addition to what you already had in mind :)
>
> Yes, I think that we need a test tool similar to ITBLL. Btw, making backup
> working in challenging conditions was not a goal of FT design, correct
> failure handling was a goal.
>
> On Tue, Sep 12, 2017 at 9:53 AM, Josh Elser <[email protected]> wrote:
>
> > Thanks for the quick feedback!
> >
> > On 9/12/17 12:36 PM, Stack wrote:
> >
> >> On Tue, Sep 12, 2017 at 9:33 AM, Andrew Purtell <
> [email protected]
> >> >
> >> wrote:
> >>
> >> I think those are reasonable criteria Josh.
> >>>
> >>> What I would like to see is something like "we ran ITBLL (or custom
> >>> generator with similar correctness validation if you prefer) on a dev
> >>> cluster (5-10 nodes) for 24 hours with server killing chaos agents
> >>> active,
> >>> attempted 1,440 backups (one per minute), of which 1,000 succeeded and
> >>> 100%
> >>> if these were successfully restored and validated." This implies your
> >>> points on automation and no manual intervention. Maybe the number of
> >>> successful backups under challenging conditions will be lower. Point is
> >>> they demonstrate we can rely on it even when a cluster is partially
> >>> unhealthy, which in production is often the normal order of affairs.
> >>>
> >>>
> >>>
> > I like it. I hadn't thought about stressing quite this aggressively, but
> > now that I think about it, sounds like a great plan. Having some ballpark
> > measure to quantify the cost of a "backup-heavy" workload would be cool
> in
> > addition to seeing how the system reacts in unexpected manners.
> >
> > Sounds good to me.
> >>
> >> How will you test the restore aspect? After 1k (or whatever makes sense)
> >> incremental backups over the life of the chaos, could you restore and
> >> validate that the table had all expected data in place.
> >>
> >
> > Exactly. My thinking was that, at any point, we should be able to do a
> > restore and validate. Maybe something like: every Nth ITBLL iteration,
> make
> > a new backup point, restore a previous backup point, verify, restore to
> > newest backup point. The previous backup point should be a full or
> > incremental point.
> >
> > Vlad: I'm obviously curious to see what you think about this stuff, in
> > addition to what you already had in mind :)
> >
>

Re: [DISCUSS] Plan for Distributed testing of Backup and Restore

Reply via email to