[jira] Updated: (HBASE-2341) Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random kills of regionserver+datanodes in small cluster

Andrew Purtell (JIRA) Wed, 17 Mar 2010 17:18:52 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Purtell updated HBASE-2341:
----------------------------------

    Attachment: count-slaves.rb
                test.sh

Timely. I just put together a first cut of scripted PE up on EC2 using our EC2 
scripts. Requires scripts from current head of trunk or 0.20 branch. Fix up the 
attached script to choose one of the public AMIs. 

But what I want to do is take this bash recipe which works well enough (but is 
not totally robust) and convert it into a web service that allows one to

- Select a base AMI (with HBase public AMIs as default) and instance types

- Upload replacement Hadoop or HBase jars, additional jars for lib/

- Upload additional files, i.e. test scripts

- Execute something on the command line

Each test gets its own transient cluster. All output and logs are collected 
from the cluster just before it is terminated and made available on an ongoing 
basis for posterity. 

Given what I've been working with recently and am most familiar with, I'd 
implement it as a servlet: can use the jars provided by AWS with their command 
line tools directly, jsch, etc. 

> Suite of test scripts that a.) load a cluster with a verifiable dataset and 
> b.) do random kills of regionserver+datanodes in small cluster
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2341
>                 URL: https://issues.apache.org/jira/browse/HBASE-2341
>             Project: Hadoop HBase
>          Issue Type: Task
>            Reporter: stack
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: count-slaves.rb, test.sh
>
>
> We just filed hbase-2340 but discussion up on irc has it that we need 
> something more hardcore than pussy-footing inside a single jvm as hdfs-2340 
> does.  The point was made (tlipcon) that its hard to ensure real recovery 
> working if all is in the one JVM.
> So, this issue is about scripts that can:
> + load a cluster with a dataset that we can 'verify' as in we can tell if it 
> has holes in it, if data has been lost.
> + script that does random kill of a random node on some random occasion
> + Script that can check cluster for data loss
> All above should work while cluster is under load.
> The above would not sit under junit.
> This looks like a suite that we'd want to run up in ec2 using Andrew's 
> scripts and our donated aws credits.
> {code}
> 16:12 < tlipcon> here's my goal: we have a 5 node cluster in the back room. I 
> want to run hbase on that at near full load for a week straight while some 
> process goes around screwing with it
> 16:12 < tlipcon> then I want to verify that I didn't lose a single edit over 
> that week
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2341) Suite of test scripts that a.) load a cluster with a verifiable dataset and b.) do random kills of regionserver+datanodes in small cluster

Reply via email to