Re: [DISCUSS] Plan for Distributed testing of Backup and Restore

Vladimir Rodionov Tue, 12 Sep 2017 11:30:49 -0700

Yes, we have already some IT, so will need to upgrade it for scale testing.


On Tue, Sep 12, 2017 at 11:28 AM, Ted Yu <[email protected]> wrote:

> bq. we need a test tool similar to ITBLL
>
> How about making the following such a tool ?
>
> hbase-it/src/test/java/org/apache/hadoop/hbase/
> IntegrationTestBackupRestore.java
>
> On Tue, Sep 12, 2017 at 11:25 AM, Vladimir Rodionov <
> [email protected]>
> wrote:
>
> > >> Vlad: I'm obviously curious to see what you think about this stuff, in
> > addition to what you already had in mind :)
> >
> > Yes, I think that we need a test tool similar to ITBLL. Btw, making
> backup
> > working in challenging conditions was not a goal of FT design, correct
> > failure handling was a goal.
> >
> > On Tue, Sep 12, 2017 at 9:53 AM, Josh Elser <[email protected]> wrote:
> >
> > > Thanks for the quick feedback!
> > >
> > > On 9/12/17 12:36 PM, Stack wrote:
> > >
> > >> On Tue, Sep 12, 2017 at 9:33 AM, Andrew Purtell <
> > [email protected]
> > >> >
> > >> wrote:
> > >>
> > >> I think those are reasonable criteria Josh.
> > >>>
> > >>> What I would like to see is something like "we ran ITBLL (or custom
> > >>> generator with similar correctness validation if you prefer) on a dev
> > >>> cluster (5-10 nodes) for 24 hours with server killing chaos agents
> > >>> active,
> > >>> attempted 1,440 backups (one per minute), of which 1,000 succeeded
> and
> > >>> 100%
> > >>> if these were successfully restored and validated." This implies your
> > >>> points on automation and no manual intervention. Maybe the number of
> > >>> successful backups under challenging conditions will be lower. Point
> is
> > >>> they demonstrate we can rely on it even when a cluster is partially
> > >>> unhealthy, which in production is often the normal order of affairs.
> > >>>
> > >>>
> > >>>
> > > I like it. I hadn't thought about stressing quite this aggressively,
> but
> > > now that I think about it, sounds like a great plan. Having some
> ballpark
> > > measure to quantify the cost of a "backup-heavy" workload would be cool
> > in
> > > addition to seeing how the system reacts in unexpected manners.
> > >
> > > Sounds good to me.
> > >>
> > >> How will you test the restore aspect? After 1k (or whatever makes
> sense)
> > >> incremental backups over the life of the chaos, could you restore and
> > >> validate that the table had all expected data in place.
> > >>
> > >
> > > Exactly. My thinking was that, at any point, we should be able to do a
> > > restore and validate. Maybe something like: every Nth ITBLL iteration,
> > make
> > > a new backup point, restore a previous backup point, verify, restore to
> > > newest backup point. The previous backup point should be a full or
> > > incremental point.
> > >
> > > Vlad: I'm obviously curious to see what you think about this stuff, in
> > > addition to what you already had in mind :)
> > >
> >
>

Re: [DISCUSS] Plan for Distributed testing of Backup and Restore

Reply via email to