I run real machines, they aren't too expensive and are substantially
more performant than the virtualized servers EC2 offers. I have 10b
rows loaded on 20 machines, but you could probably do that on 10 or
so. Don't forget that 10b rows would require a $40000 machine to use
on mysql, so why not spend $40000 on a cluster?

On Tue, Aug 18, 2009 at 12:20 PM, Jonathan Gray<[email protected]> wrote:
> I have a little util I created called HBench.  You can customize the
> different parameters to generate data of varying sizes/patterns/etc.
>
> https://issues.apache.org/jira/browse/HBASE-1501
>
> JG
>
> Andrew Purtell wrote:
>>
>> Most that I am aware of set up transient test environments up on EC2.
>>
>> You can use one instance to create an EBS volume containing all software
>> and config you need, then snapshot it, then clone volumes based on the
>> snapshot to attach to any number of instances you need. Use X-Large
>> instances, at least 4. Give HBase regionservers 2GB heap. Then try your
>> 10 billion row test case.
>>
>>   - Andy
>>
>>
>>
>>
>> ________________________________
>> From: Greg Cottman <[email protected]>
>> To: "[email protected]" <[email protected]>
>> Sent: Tuesday, August 18, 2009 4:13:23 PM
>> Subject: Public HBase data store?
>>
>> Hi all,
>>
>> I need to do some scalability testing of an HBase query tool.  We have
>> just started using HBase and sadly do not have an existing database against
>> which to test.  Things we are interested in exploring is the difference
>> between using an index table strategy versus map/reduce queries without
>> indexes.
>>
>> I realise this is a long shot and that queries are very data-dependent,
>> but...  Are there any publicly accessible HBase stores or reference sites
>> against which you can run test queries?
>>
>> Or does everyone just create a 10 billion row test environment on their
>> local development box?  :-)
>>
>> Cheers,
>> Greg.
>>
>>
>>
>>
>

Reply via email to