Thanks for the answers. We're just about ready with our test cluster and we will try this test specifically.
The number of Tomcat servers hitting a common row is currently 40 with potentially up to 100 threads each (at peak intervals). You also said: > This is hbase. You don't buy bigger hardware, you just add nodes (smile). Not sure if that was tongue-in-cheek, because adding nodes wouldn't address the hot row issue would it?? Thanks again Brad On Feb 16, 2010, at 9:23 PM, Stack wrote: > On Tue, Feb 16, 2010 at 7:28 PM, Brad McCarty <mcca...@gmail.com> wrote: > >> I read in another post that if one has a "hot" row in a table, meaning very >> heavy read access to the same row, that the regionserver managing the region >> with that row can become a single bottleneck. >> > > If hot, it'll probably get stapled into cache. > > >> Is my understanding accurate? If so, then assuming I can cache the data in >> the memstore, will CPU utilization become the likely limiting resource on >> that regionserver? > > Yes. That should be the case. > > > Also, if I'm hitting the region server from many client servers > (Tomcat app servers), will the socket connection management overhead > on the regionserver overwhelm that server? >> > > How many clients? 4 or 500 tomcat threads? > > The way the ipc between hbase client and server works is that it keeps > up a single socket connection and multiplexes request/response over > this one connection. This is how hadoop rpc works. > > >> If that's true, are there any other steps that can be taken to mitigate that >> risk, other than buying bigger hardware? >> > > This is hbase. You don't buy bigger hardware, you just add nodes (smile). > > The proper answer to your questions above is for you to give it a test > run. Try setting up a cluster of about 5 hbase nodes and try a tomcat > server requesting playing a query log that resembles what you might > have in production. > > Yours, > St.Ack