Thanks for the answers.  We're just about ready with our test cluster and we 
will try this test specifically.

The number of Tomcat servers hitting a common row is currently 40 with 
potentially up to 100 threads each (at peak intervals).

You also said: 
> This is hbase.  You don't buy bigger hardware, you just add nodes (smile).


Not sure if that was tongue-in-cheek, because adding nodes wouldn't address the 
hot row issue would it??

Thanks again
Brad


On Feb 16, 2010, at 9:23 PM, Stack wrote:

> On Tue, Feb 16, 2010 at 7:28 PM, Brad McCarty <mcca...@gmail.com> wrote:
> 
>> I read in another post that if one has a "hot" row in a table, meaning very 
>> heavy read access to the same row, that the regionserver managing the region 
>> with that row can become a single bottleneck.
>> 
> 
> If hot, it'll probably get stapled into cache.
> 
> 
>> Is my understanding accurate?  If so, then assuming I can cache the data in 
>> the memstore, will CPU utilization become the likely limiting resource on 
>> that regionserver?
> 
> Yes.  That should be the case.
> 
> 
> Also, if I'm hitting the region server from many client servers
> (Tomcat app servers), will the socket connection management overhead
> on the regionserver overwhelm that server?
>> 
> 
> How many clients?  4 or 500 tomcat threads?
> 
> The way the ipc between hbase client and server works is that it keeps
> up a single socket connection and multiplexes request/response over
> this one connection.  This is how hadoop rpc works.
> 
> 
>> If that's true, are there any other steps that can be taken to mitigate that 
>> risk, other than buying bigger hardware?
>> 
> 
> This is hbase.  You don't buy bigger hardware, you just add nodes (smile).
> 
> The proper answer to your questions above is for you to give it a test
> run.  Try setting up a cluster of about 5 hbase nodes and try a tomcat
> server requesting playing a query log that resembles what you might
> have in production.
> 
> Yours,
> St.Ack

Reply via email to