I run real machines, they aren't too expensive and are substantially more performant than the virtualized servers EC2 offers. I have 10b rows loaded on 20 machines, but you could probably do that on 10 or so. Don't forget that 10b rows would require a $40000 machine to use on mysql, so why not spend $40000 on a cluster?
On Tue, Aug 18, 2009 at 12:20 PM, Jonathan Gray<[email protected]> wrote: > I have a little util I created called HBench. You can customize the > different parameters to generate data of varying sizes/patterns/etc. > > https://issues.apache.org/jira/browse/HBASE-1501 > > JG > > Andrew Purtell wrote: >> >> Most that I am aware of set up transient test environments up on EC2. >> >> You can use one instance to create an EBS volume containing all software >> and config you need, then snapshot it, then clone volumes based on the >> snapshot to attach to any number of instances you need. Use X-Large >> instances, at least 4. Give HBase regionservers 2GB heap. Then try your >> 10 billion row test case. >> >> - Andy >> >> >> >> >> ________________________________ >> From: Greg Cottman <[email protected]> >> To: "[email protected]" <[email protected]> >> Sent: Tuesday, August 18, 2009 4:13:23 PM >> Subject: Public HBase data store? >> >> Hi all, >> >> I need to do some scalability testing of an HBase query tool. We have >> just started using HBase and sadly do not have an existing database against >> which to test. Things we are interested in exploring is the difference >> between using an index table strategy versus map/reduce queries without >> indexes. >> >> I realise this is a long shot and that queries are very data-dependent, >> but... Are there any publicly accessible HBase stores or reference sites >> against which you can run test queries? >> >> Or does everyone just create a 10 billion row test environment on their >> local development box? :-) >> >> Cheers, >> Greg. >> >> >> >> >
