One thing to note is that 10GB is half the memory of a reasonable sized machine. In fact I have seen 128 GB memcache boxes out there.
As for performance, I obviously feel HBase can be performant for real time queries. To get a consistent response you absolutely have to have 95%+ caching in ram. There is no way to achieve 1-2ms responses from disk. Throwing enough ram at the problem, I think HBase solves this nicely and you won't have to maintain multiple architectures. -ryan On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray <jl...@streamy.com> wrote: > Brian, > > I would just reiterate what others have said. If you're goal is a > consistent 1-2ms read latency and your dataset is on the order of 10GB... > HBase is not a good match. It's more than what you need and you'll take > unnecessary performance hits. > > I would look at some of the simpler KV-style stores out there like Tokyo > Cabinet, Memcached, or BerkeleyDB, the in-memory ones like Redis. > > JG > > -----Original Message----- > From: jaxzin [mailto:brian.r.jack...@espn3.com] > Sent: Tuesday, March 09, 2010 12:09 PM > To: hbase-user@hadoop.apache.org > Subject: Re: Use cases of HBase > > > Gary, I looked at your presentation and it was very helpful. But I do have > a > few unanswered questions from it if you wouldn't mind answering them. How > big is/was your cluster that handled 3k req/sec? And what were the specs on > each node (RAM/CPU)? > > When you say latency can be good, what you mean? Is it even in the ballpark > of 1 ms? Because we already deal with the GC and don't expect perfect > real-time behavior. So that might be okay with me. > > P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's presentation > there but somehow mentally blocked it. Thanks for the reminder. > > > > Gary Helmling wrote: >> >> Hey Brian, >> >> We use HBase to complement MySQL in serving activity-stream type data here >> at Meetup. It's handling real-time requests involved in 20-25% of our >> page >> views, but our latency requirements aren't as strict as yours. For what >> it's worth, I did a presentation on our setup which will hopefully fill in >> some details: http://www.slideshare.net/ghelmling/hbase-at-meetup >> >> There are also some great presentations by Ryan Rawson and Jonathan Gray >> on >> how they've used HBase for realtime serving on their sites. See the >> presentations wiki page: >> http://wiki.apache.org/hadoop/HBase/HBasePresentations >> >> Like Barney, I suspect where you'll hit some issues will be in your >> latency >> requirements. Depending on how you layout your data and configure your >> column families, your average latency may be good, but you will hit some >> pauses as I believe reads block at times during region splits or >> compactions >> and memstore flushes (unless you have a fairly static data set). Others >> here should be able to fill in more details. >> >> With a relatively small dataset, you may want to look at the "in memory" >> configuration option for your column families. >> >> What's your expected workload -- writes vs. reads? types of reads you'll >> be >> doing: random access vs. sequential? There are a lot of knowledgeable >> folks >> here to offer advice if you can give us some more insight into what you're >> trying to build. >> >> --gh >> >> >> On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <brian.r.jack...@espn3.com> wrote: >> >>> >>> This is exactly the kind of feedback I'm looking for thanks, Barney. >>> >>> So its sounds like you cache the data you get from HBase in a >>> session-based >>> memory? Are you using a Java EE HttpSession? (I'm less familiar with >>> django/rails equivalent but I'm assuming they exist) Or are you using a >>> memory cache provider like ehcache or memcache(d)? >>> >>> Can you tell me more about your experience with latency and why you say >>> that? >>> >>> >>> Barney Frank wrote: >>> > >>> > I am using Hbase to store visitor level clickstream-like data. At the >>> > beginning of the visitor session I retrieve all the previous session >>> data >>> > from hbase and use it within my app server and massage it a little and >>> > serve >>> > to the consumer via web services. Where I think you will run into the >>> > most >>> > problems is your latency requirement. >>> > >>> > Just my 2 cents from a user. >>> > >>> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <brian.r.jack...@espn3.com> >>> wrote: >>> > >>> >> >>> >> Hi all, I've got a question about how everyone is using HBase. Is >>> anyone >>> >> using its as online data store to directly back a web service? >>> >> >>> >> The text-book example of a weblink HBase table suggests there would be >>> an >>> >> associated web front-end to display the information in that HBase >>> table >>> >> (ex. >>> >> search results page), but I'm having trouble finding evidence that >>> anyone >>> >> is >>> >> servicing web traffic backed directly by an HBase instance in >>> practice. >>> >> >>> >> I'm evaluating if HBase would be the right tool to provide a few >>> things >>> >> for >>> >> a large-scale web service we want to develop at ESPN and I'd really >>> like >>> >> to >>> >> get opinions and experience from people who have already been down >>> this >>> >> path. No need to reinvent the wheel, right? >>> >> >>> >> I can tell you a little about the project goals if it helps give you >>> an >>> >> idea >>> >> of what I'm trying to design for: >>> >> >>> >> 1) Highly available (It would be a central service and an outage would >>> >> take >>> >> down everything) >>> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable) >>> >> 3) High throughput (5-10k req/sec at worse case peak) >>> >> 4) Unstable traffic (ex. Sunday afternoons during football season) >>> >> 5) Small data...for now (< 10 GB of total data currently, but HBase >>> could >>> >> allow us to design differently and store more online) >>> >> >>> >> The reason I'm looking at HBase is that we've solved many of our >>> scaling >>> >> issues with the same basic concepts of HBase (sharding, flattening >>> data >>> >> to >>> >> fit in one row, throw away ACID, etc) but with home-grown software. >>> I'd >>> >> like to adopt an active open-source project if it makes sense. >>> >> >>> >> Alternatives I'm also looking at: RDBMS fronted with Websphere eXtreme >>> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I >>> understand >>> >> the >>> >> least right now) memcached. >>> >> >>> >> Thanks, >>> >> Brian >>> >> -- >>> >> View this message in context: >>> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html >>> >> Sent from the HBase User mailing list archive at Nabble.com. >>> >> >>> >> >>> > >>> > >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html >>> Sent from the HBase User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html > Sent from the HBase User mailing list archive at Nabble.com. > > >