Hi Ryan, What kind of random row lookup throughput do you get (e.g. rows per second) on the 10b store on the 20 machine cluster (assuming client isn't saturating)?
I'm pondering indexing hbase rows in various ways with Lucene with only the row key stored. Then page over search results and stream out response (transforming to preferred response format on the fly - RDF, CSV, XML etc) by doing sequential "get by key" calls. Maybe stupid idea, but not sure what else can index so well. I'm just curious... Thanks, Tim On Tue, Aug 18, 2009 at 10:07 PM, Ryan Rawson<[email protected]> wrote: > I run real machines, they aren't too expensive and are substantially > more performant than the virtualized servers EC2 offers. I have 10b > rows loaded on 20 machines, but you could probably do that on 10 or > so. Don't forget that 10b rows would require a $40000 machine to use > on mysql, so why not spend $40000 on a cluster? > > On Tue, Aug 18, 2009 at 12:20 PM, Jonathan Gray<[email protected]> wrote: >> I have a little util I created called HBench. You can customize the >> different parameters to generate data of varying sizes/patterns/etc. >> >> https://issues.apache.org/jira/browse/HBASE-1501 >> >> JG >> >> Andrew Purtell wrote: >>> >>> Most that I am aware of set up transient test environments up on EC2. >>> >>> You can use one instance to create an EBS volume containing all software >>> and config you need, then snapshot it, then clone volumes based on the >>> snapshot to attach to any number of instances you need. Use X-Large >>> instances, at least 4. Give HBase regionservers 2GB heap. Then try your >>> 10 billion row test case. >>> >>> - Andy >>> >>> >>> >>> >>> ________________________________ >>> From: Greg Cottman <[email protected]> >>> To: "[email protected]" <[email protected]> >>> Sent: Tuesday, August 18, 2009 4:13:23 PM >>> Subject: Public HBase data store? >>> >>> Hi all, >>> >>> I need to do some scalability testing of an HBase query tool. We have >>> just started using HBase and sadly do not have an existing database against >>> which to test. Things we are interested in exploring is the difference >>> between using an index table strategy versus map/reduce queries without >>> indexes. >>> >>> I realise this is a long shot and that queries are very data-dependent, >>> but... Are there any publicly accessible HBase stores or reference sites >>> against which you can run test queries? >>> >>> Or does everyone just create a 10 billion row test environment on their >>> local development box? :-) >>> >>> Cheers, >>> Greg. >>> >>> >>> >>> >> >
