With Nagle's you'd see something around 40ms. You are not saying 0.8ms RTT is bad, right? Are you seeing ~40ms latencies?
This thread has gotten confusing. I would try these: * one Configuration for all tables. Or even use a single HConnection/Threadpool and use the HTable(byte[], HConnection, ExecutorService) constructor * disable Nagle's: set both ipc.server.tcpnodelay and hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and* server) * increase hbase.client.ipc.pool.size in client's hbase-site.xml * enable short circuit reads (details depend on exact version of Hadoop). Google will help :) -- Lars ----- Original Message ----- From: Vladimir Rodionov <vladrodio...@gmail.com> To: dev@hbase.apache.org Cc: Sent: Tuesday, July 30, 2013 1:30 PM Subject: Re: HBase read perfomnance and HBase client This hbase.ipc.client.tcpnodelay (default - false) explains poor single thread performance and high latency ( 0.8ms in local network)? On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov <vladrodio...@gmail.com>wrote: > One more observation: One Configuration instance per HTable gives 50% > boost as compared to single Configuration object for all HTable's - from > 20K to 30K > > > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <vladrodio...@gmail.com > > wrote: > >> This thread dump has been taken when client was sending 60 requests in >> parallel (at least, in theory). There are 50 server handler threads. >> >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov < >> vladrodio...@gmail.com> wrote: >> >>> Sure, here it is: >>> >>> http://pastebin.com/8TjyrKRT >>> >>> epoll is not only to read/write HDFS but to connect/listen to clients as >>> well? >>> >>> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans < >>> jdcry...@apache.org> wrote: >>> >>>> Can you show us what the thread dump looks like when the threads are >>>> BLOCKED? There aren't that many locks on the read path when reading >>>> out of the block cache, and epoll would only happen if you need to hit >>>> HDFS, which you're saying is not happening. >>>> >>>> J-D >>>> >>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov >>>> <vladrodio...@gmail.com> wrote: >>>> > I am hitting data in a block cache, of course. The data set is very >>>> small >>>> > to fit comfortably into block cache and all request are directed to >>>> the >>>> > same Region to guarantee single RS testing. >>>> > >>>> > To Ted: >>>> > >>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with >>>> respect >>>> > to read performance? >>>> > >>>> > >>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans < >>>> jdcry...@apache.org>wrote: >>>> > >>>> >> That's a tough one. >>>> >> >>>> >> One thing that comes to mind is socket reuse. It used to come up more >>>> >> more often but this is an issue that people hit when doing loads of >>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing >>>> >> anything :) >>>> >> >>>> >> Also if you _just_ want to saturate something, be it CPU or network, >>>> >> wouldn't it be better to hit data only in the block cache? This way >>>> it >>>> >> has the lowest overhead? >>>> >> >>>> >> Last thing I wanted to mention is that yes, the client doesn't scale >>>> >> very well. I would suggest you give the asynchbase client a run. >>>> >> >>>> >> J-D >>>> >> >>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov >>>> >> <vrodio...@carrieriq.com> wrote: >>>> >> > I have been doing quite extensive testing of different read >>>> scenarios: >>>> >> > >>>> >> > 1. blockcache disabled/enabled >>>> >> > 2. data is local/remote (no good hdfs locality) >>>> >> > >>>> >> > and it turned out that that I can not saturate 1 RS using one >>>> >> (comparable in CPU power and RAM) client host: >>>> >> > >>>> >> > I am running client app with 60 read threads active (with >>>> multi-get) >>>> >> that is going to one particular RS and >>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means >>>> that >>>> >> load is ~5% >>>> >> > >>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE >>>> states >>>> >> (epoll) >>>> >> > >>>> >> > I attribute this to the HBase client implementation which seems >>>> to be >>>> >> not scalable (I am going dig into client later on today). >>>> >> > >>>> >> > Some numbers: The maximum what I could get from Single get (60 >>>> threads): >>>> >> 30K per sec. Multiget gives ~ 75K (60 threads) >>>> >> > >>>> >> > What are my options? I want to measure the limits and I do not >>>> want to >>>> >> run Cluster of clients against just ONE Region Server? >>>> >> > >>>> >> > RS config: 96GB RAM, 16(32) CPU >>>> >> > Client : 48GB RAM 8 (16) CPU >>>> >> > >>>> >> > Best regards, >>>> >> > Vladimir Rodionov >>>> >> > Principal Platform Engineer >>>> >> > Carrier IQ, www.carrieriq.com >>>> >> > e-mail: vrodio...@carrieriq.com >>>> >> > >>>> >> > >>>> >> > Confidentiality Notice: The information contained in this message, >>>> >> including any attachments hereto, may be confidential and is >>>> intended to be >>>> >> read only by the individual or entity to whom this message is >>>> addressed. If >>>> >> the reader of this message is not the intended recipient or an agent >>>> or >>>> >> designee of the intended recipient, please note that any review, use, >>>> >> disclosure or distribution of this message or its attachments, in >>>> any form, >>>> >> is strictly prohibited. If you have received this message in error, >>>> please >>>> >> immediately notify the sender and/or notificati...@carrieriq.com and >>>> >> delete or destroy any copy of this message and its attachments. >>>> >> >>>> >>> >>> >> >