Yes, we have already some IT, so will need to upgrade it for scale testing.
On Tue, Sep 12, 2017 at 11:28 AM, Ted Yu <[email protected]> wrote: > bq. we need a test tool similar to ITBLL > > How about making the following such a tool ? > > hbase-it/src/test/java/org/apache/hadoop/hbase/ > IntegrationTestBackupRestore.java > > On Tue, Sep 12, 2017 at 11:25 AM, Vladimir Rodionov < > [email protected]> > wrote: > > > >> Vlad: I'm obviously curious to see what you think about this stuff, in > > addition to what you already had in mind :) > > > > Yes, I think that we need a test tool similar to ITBLL. Btw, making > backup > > working in challenging conditions was not a goal of FT design, correct > > failure handling was a goal. > > > > On Tue, Sep 12, 2017 at 9:53 AM, Josh Elser <[email protected]> wrote: > > > > > Thanks for the quick feedback! > > > > > > On 9/12/17 12:36 PM, Stack wrote: > > > > > >> On Tue, Sep 12, 2017 at 9:33 AM, Andrew Purtell < > > [email protected] > > >> > > > >> wrote: > > >> > > >> I think those are reasonable criteria Josh. > > >>> > > >>> What I would like to see is something like "we ran ITBLL (or custom > > >>> generator with similar correctness validation if you prefer) on a dev > > >>> cluster (5-10 nodes) for 24 hours with server killing chaos agents > > >>> active, > > >>> attempted 1,440 backups (one per minute), of which 1,000 succeeded > and > > >>> 100% > > >>> if these were successfully restored and validated." This implies your > > >>> points on automation and no manual intervention. Maybe the number of > > >>> successful backups under challenging conditions will be lower. Point > is > > >>> they demonstrate we can rely on it even when a cluster is partially > > >>> unhealthy, which in production is often the normal order of affairs. > > >>> > > >>> > > >>> > > > I like it. I hadn't thought about stressing quite this aggressively, > but > > > now that I think about it, sounds like a great plan. Having some > ballpark > > > measure to quantify the cost of a "backup-heavy" workload would be cool > > in > > > addition to seeing how the system reacts in unexpected manners. > > > > > > Sounds good to me. > > >> > > >> How will you test the restore aspect? After 1k (or whatever makes > sense) > > >> incremental backups over the life of the chaos, could you restore and > > >> validate that the table had all expected data in place. > > >> > > > > > > Exactly. My thinking was that, at any point, we should be able to do a > > > restore and validate. Maybe something like: every Nth ITBLL iteration, > > make > > > a new backup point, restore a previous backup point, verify, restore to > > > newest backup point. The previous backup point should be a full or > > > incremental point. > > > > > > Vlad: I'm obviously curious to see what you think about this stuff, in > > > addition to what you already had in mind :) > > > > > >
