Brian, I would just reiterate what others have said. If you're goal is a consistent 1-2ms read latency and your dataset is on the order of 10GB... HBase is not a good match. It's more than what you need and you'll take unnecessary performance hits.
I would look at some of the simpler KV-style stores out there like Tokyo Cabinet, Memcached, or BerkeleyDB, the in-memory ones like Redis. JG -----Original Message----- From: jaxzin [mailto:brian.r.jack...@espn3.com] Sent: Tuesday, March 09, 2010 12:09 PM To: hbase-user@hadoop.apache.org Subject: Re: Use cases of HBase Gary, I looked at your presentation and it was very helpful. But I do have a few unanswered questions from it if you wouldn't mind answering them. How big is/was your cluster that handled 3k req/sec? And what were the specs on each node (RAM/CPU)? When you say latency can be good, what you mean? Is it even in the ballpark of 1 ms? Because we already deal with the GC and don't expect perfect real-time behavior. So that might be okay with me. P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's presentation there but somehow mentally blocked it. Thanks for the reminder. Gary Helmling wrote: > > Hey Brian, > > We use HBase to complement MySQL in serving activity-stream type data here > at Meetup. It's handling real-time requests involved in 20-25% of our > page > views, but our latency requirements aren't as strict as yours. For what > it's worth, I did a presentation on our setup which will hopefully fill in > some details: http://www.slideshare.net/ghelmling/hbase-at-meetup > > There are also some great presentations by Ryan Rawson and Jonathan Gray > on > how they've used HBase for realtime serving on their sites. See the > presentations wiki page: > http://wiki.apache.org/hadoop/HBase/HBasePresentations > > Like Barney, I suspect where you'll hit some issues will be in your > latency > requirements. Depending on how you layout your data and configure your > column families, your average latency may be good, but you will hit some > pauses as I believe reads block at times during region splits or > compactions > and memstore flushes (unless you have a fairly static data set). Others > here should be able to fill in more details. > > With a relatively small dataset, you may want to look at the "in memory" > configuration option for your column families. > > What's your expected workload -- writes vs. reads? types of reads you'll > be > doing: random access vs. sequential? There are a lot of knowledgeable > folks > here to offer advice if you can give us some more insight into what you're > trying to build. > > --gh > > > On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <brian.r.jack...@espn3.com> wrote: > >> >> This is exactly the kind of feedback I'm looking for thanks, Barney. >> >> So its sounds like you cache the data you get from HBase in a >> session-based >> memory? Are you using a Java EE HttpSession? (I'm less familiar with >> django/rails equivalent but I'm assuming they exist) Or are you using a >> memory cache provider like ehcache or memcache(d)? >> >> Can you tell me more about your experience with latency and why you say >> that? >> >> >> Barney Frank wrote: >> > >> > I am using Hbase to store visitor level clickstream-like data. At the >> > beginning of the visitor session I retrieve all the previous session >> data >> > from hbase and use it within my app server and massage it a little and >> > serve >> > to the consumer via web services. Where I think you will run into the >> > most >> > problems is your latency requirement. >> > >> > Just my 2 cents from a user. >> > >> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <brian.r.jack...@espn3.com> >> wrote: >> > >> >> >> >> Hi all, I've got a question about how everyone is using HBase. Is >> anyone >> >> using its as online data store to directly back a web service? >> >> >> >> The text-book example of a weblink HBase table suggests there would be >> an >> >> associated web front-end to display the information in that HBase >> table >> >> (ex. >> >> search results page), but I'm having trouble finding evidence that >> anyone >> >> is >> >> servicing web traffic backed directly by an HBase instance in >> practice. >> >> >> >> I'm evaluating if HBase would be the right tool to provide a few >> things >> >> for >> >> a large-scale web service we want to develop at ESPN and I'd really >> like >> >> to >> >> get opinions and experience from people who have already been down >> this >> >> path. No need to reinvent the wheel, right? >> >> >> >> I can tell you a little about the project goals if it helps give you >> an >> >> idea >> >> of what I'm trying to design for: >> >> >> >> 1) Highly available (It would be a central service and an outage would >> >> take >> >> down everything) >> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable) >> >> 3) High throughput (5-10k req/sec at worse case peak) >> >> 4) Unstable traffic (ex. Sunday afternoons during football season) >> >> 5) Small data...for now (< 10 GB of total data currently, but HBase >> could >> >> allow us to design differently and store more online) >> >> >> >> The reason I'm looking at HBase is that we've solved many of our >> scaling >> >> issues with the same basic concepts of HBase (sharding, flattening >> data >> >> to >> >> fit in one row, throw away ACID, etc) but with home-grown software. >> I'd >> >> like to adopt an active open-source project if it makes sense. >> >> >> >> Alternatives I'm also looking at: RDBMS fronted with Websphere eXtreme >> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I >> understand >> >> the >> >> least right now) memcached. >> >> >> >> Thanks, >> >> Brian >> >> -- >> >> View this message in context: >> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html >> >> Sent from the HBase User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html Sent from the HBase User mailing list archive at Nabble.com.