2x1Gb bonded, I think. This is our standard config.
On Thu, Aug 1, 2013 at 10:27 AM, Michael Segel <msegel_had...@hotmail.com>wrote: > Network? 1GbE or 10GbE? > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jul 31, 2013, at 9:27 PM, "Vladimir Rodionov" <vladrodio...@gmail.com> > wrote: > > > Some final numbers : > > > > Test config: > > > > HBase 0.94.6 > > blockcache=true, block size = 64K, KV size = 62 bytes (raw). > > > > 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7 > > 1 RS Server: the same config. > > > > Local network with ping between hosts: 0.1 ms > > > > > > 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU, > > threads, IO pool size and other settings. > > 2. HBase server was able to sustain 170K per sec (with 64K block size). > All > > from block cache. KV size = 62 bytes (very small). This is for single Get > > op, 60 threads per client, 5 clients (on different hosts) > > 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size > > tested: 30, 100. The same performance absolutely as with batch size = 1. > > Multi get has some internal issues on RegionServer side. May be excessive > > locking or some thing else. > > > > > > > > > > > > On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov > > <vladrodio...@gmail.com>wrote: > > > >> 1. SCR are enabled > >> 2. Single Configuration for all table did not work well, but I will try > it > >> again > >> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference > >> > >> > >> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <la...@apache.org> > wrote: > >> > >>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms > >>> RTT is bad, right? Are you seeing ~40ms latencies? > >>> > >>> This thread has gotten confusing. > >>> > >>> I would try these: > >>> * one Configuration for all tables. Or even use a single > >>> HConnection/Threadpool and use the HTable(byte[], HConnection, > >>> ExecutorService) constructor > >>> * disable Nagle's: set both ipc.server.tcpnodelay and > >>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client > *and* > >>> server) > >>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml > >>> * enable short circuit reads (details depend on exact version of > Hadoop). > >>> Google will help :) > >>> > >>> -- Lars > >>> > >>> > >>> ----- Original Message ----- > >>> From: Vladimir Rodionov <vladrodio...@gmail.com> > >>> To: dev@hbase.apache.org > >>> Cc: > >>> Sent: Tuesday, July 30, 2013 1:30 PM > >>> Subject: Re: HBase read perfomnance and HBase client > >>> > >>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single > >>> thread performance and high latency ( 0.8ms in local network)? > >>> > >>> > >>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov > >>> <vladrodio...@gmail.com>wrote: > >>> > >>>> One more observation: One Configuration instance per HTable gives 50% > >>>> boost as compared to single Configuration object for all HTable's - > from > >>>> 20K to 30K > >>>> > >>>> > >>>> On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov < > >>> vladrodio...@gmail.com > >>>>> wrote: > >>>> > >>>>> This thread dump has been taken when client was sending 60 requests > in > >>>>> parallel (at least, in theory). There are 50 server handler threads. > >>>>> > >>>>> > >>>>> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov < > >>>>> vladrodio...@gmail.com> wrote: > >>>>> > >>>>>> Sure, here it is: > >>>>>> > >>>>>> http://pastebin.com/8TjyrKRT > >>>>>> > >>>>>> epoll is not only to read/write HDFS but to connect/listen to > clients > >>> as > >>>>>> well? > >>>>>> > >>>>>> > >>>>>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans < > >>>>>> jdcry...@apache.org> wrote: > >>>>>> > >>>>>>> Can you show us what the thread dump looks like when the threads > are > >>>>>>> BLOCKED? There aren't that many locks on the read path when reading > >>>>>>> out of the block cache, and epoll would only happen if you need to > >>> hit > >>>>>>> HDFS, which you're saying is not happening. > >>>>>>> > >>>>>>> J-D > >>>>>>> > >>>>>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov > >>>>>>> <vladrodio...@gmail.com> wrote: > >>>>>>>> I am hitting data in a block cache, of course. The data set is > very > >>>>>>> small > >>>>>>>> to fit comfortably into block cache and all request are directed > to > >>>>>>> the > >>>>>>>> same Region to guarantee single RS testing. > >>>>>>>> > >>>>>>>> To Ted: > >>>>>>>> > >>>>>>>> Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with > >>>>>>> respect > >>>>>>>> to read performance? > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans < > >>>>>>> jdcry...@apache.org>wrote: > >>>>>>>> > >>>>>>>>> That's a tough one. > >>>>>>>>> > >>>>>>>>> One thing that comes to mind is socket reuse. It used to come up > >>> more > >>>>>>>>> more often but this is an issue that people hit when doing loads > >>> of > >>>>>>>>> random reads. Try enabling tcp_tw_recycle but I'm not > guaranteeing > >>>>>>>>> anything :) > >>>>>>>>> > >>>>>>>>> Also if you _just_ want to saturate something, be it CPU or > >>> network, > >>>>>>>>> wouldn't it be better to hit data only in the block cache? This > >>> way > >>>>>>> it > >>>>>>>>> has the lowest overhead? > >>>>>>>>> > >>>>>>>>> Last thing I wanted to mention is that yes, the client doesn't > >>> scale > >>>>>>>>> very well. I would suggest you give the asynchbase client a run. > >>>>>>>>> > >>>>>>>>> J-D > >>>>>>>>> > >>>>>>>>> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov > >>>>>>>>> <vrodio...@carrieriq.com> wrote: > >>>>>>>>>> I have been doing quite extensive testing of different read > >>>>>>> scenarios: > >>>>>>>>>> > >>>>>>>>>> 1. blockcache disabled/enabled > >>>>>>>>>> 2. data is local/remote (no good hdfs locality) > >>>>>>>>>> > >>>>>>>>>> and it turned out that that I can not saturate 1 RS using one > >>>>>>>>> (comparable in CPU power and RAM) client host: > >>>>>>>>>> > >>>>>>>>>> I am running client app with 60 read threads active (with > >>>>>>> multi-get) > >>>>>>>>> that is going to one particular RS and > >>>>>>>>>> this RS's load is 100 -150% (out of 3200% available) - it means > >>>>>>> that > >>>>>>>>> load is ~5% > >>>>>>>>>> > >>>>>>>>>> All threads in RS are either in BLOCKED (wait) or in IN_NATIVE > >>>>>>> states > >>>>>>>>> (epoll) > >>>>>>>>>> > >>>>>>>>>> I attribute this to the HBase client implementation which seems > >>>>>>> to be > >>>>>>>>> not scalable (I am going dig into client later on today). > >>>>>>>>>> > >>>>>>>>>> Some numbers: The maximum what I could get from Single get (60 > >>>>>>> threads): > >>>>>>>>> 30K per sec. Multiget gives ~ 75K (60 threads) > >>>>>>>>>> > >>>>>>>>>> What are my options? I want to measure the limits and I do not > >>>>>>> want to > >>>>>>>>> run Cluster of clients against just ONE Region Server? > >>>>>>>>>> > >>>>>>>>>> RS config: 96GB RAM, 16(32) CPU > >>>>>>>>>> Client : 48GB RAM 8 (16) CPU > >>>>>>>>>> > >>>>>>>>>> Best regards, > >>>>>>>>>> Vladimir Rodionov > >>>>>>>>>> Principal Platform Engineer > >>>>>>>>>> Carrier IQ, www.carrieriq.com > >>>>>>>>>> e-mail: vrodio...@carrieriq.com > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Confidentiality Notice: The information contained in this > >>> message, > >>>>>>>>> including any attachments hereto, may be confidential and is > >>>>>>> intended to be > >>>>>>>>> read only by the individual or entity to whom this message is > >>>>>>> addressed. If > >>>>>>>>> the reader of this message is not the intended recipient or an > >>> agent > >>>>>>> or > >>>>>>>>> designee of the intended recipient, please note that any review, > >>> use, > >>>>>>>>> disclosure or distribution of this message or its attachments, in > >>>>>>> any form, > >>>>>>>>> is strictly prohibited. If you have received this message in > >>> error, > >>>>>>> please > >>>>>>>>> immediately notify the sender and/or > Notifications@carrieriq.comand > >>>>>>>>> delete or destroy any copy of this message and its attachments. > >> >