Ahhh... This truly was a client-side problem. Thanks for the clarification!
On 1/17/12 4:16 PM, "lars hofhansl" <[email protected]> wrote: >What happened before HBASE-5073 in 0.90.x, was that the ZK watcher (at >the client) would pile on more and more listeners. >On each ZK event these listeners are executed, slowing down the client >eventually, in addition the listeners are prevented from being garbage >collected creating a memory leak. > > >So it's client only, the RSs are not affected by this. > > >----- Original Message ----- >From: Doug Meil <[email protected]> >To: "[email protected]" <[email protected]> >Cc: >Sent: Tuesday, January 17, 2012 5:36 AM >Subject: Re: HBASE-5073 impact... > > >Hi folks, I just want to follow-up on this one more time. > >Is there anything funky happening in the client that "slows things down" >when these methods are called, or is it a reflection of RS activity? > > > >On 1/11/12 4:37 PM, "Doug Meil" <[email protected]> wrote: > >> >>Hi dev-list, >> >>With respect to HBASE-5073 and invoking the admin API and producing >>slowdowns, was the workaround (without the patch) that the client be >>restarted, or the entire cluster? I see the patch has been back-ported >>to >>90.6 but I wanted to doc this if it was warranted. >> >>Also, regarding... >> >>"As Lars mentioned admin apis like flush and compact will also slow down >>the client." >> >>... in terms of "slowing down the client", is this referring to the fact >>that subsequent requests will have to content with increased activity on >>RegionServers (e.g., due to compaction and the file-writing) will >>experience? Or is there something else going on? >> >>Again, wanted to doc this if it was warranted. >> >> >> >>On 12/27/11 9:20 PM, "Ramkrishna S Vasudevan" >><[email protected]> wrote: >> >>>As Lars mentioned admin apis like flush and compact will also slow down >>>the client. >>>As part of restart of HBase cluster, clients are also restarted? >>> >>>Regards >>>Ram >>> >>>-----Original Message----- >>>From: Lars H [mailto:[email protected]] >>>Sent: Tuesday, December 27, 2011 10:02 PM >>>To: [email protected] >>>Cc: [email protected] >>>Subject: Re: Read speed down after long running >>> >>>When you restart HBase are you also restarting the client process? >>>Are you using HBaseAdmin.tableExists? >>>If so you might be running into HBASE-5073 >>> >>>-- Lars >>> >>>Yi Liang <[email protected]> schrieb: >>> >>>>Hi all, >>>> >>>>We're running hbase 0.90.3 for one read intensive application. >>>> >>>>We find after long running(2 weeks or 1 month or longer), the read >>>>speed >>>>will become much lower. >>>> >>>>For example, a get_rows operation of thrift to fetch 20 rows (about 4k >>>>size >>>>every row) could take >2 second, sometimes even >5 seconds. When it >>>>happens, we can see cpu_wio keeps at about 10. >>>> >>>>But if we restart hbase(only master and regionservers) with >>>>stop-hbase.sh >>>>and start-hbase.sh, we can see the read speed back to normal >>>>immediately, >>>>which is <200 ms for every get_rows operation, and the cpu_wio drops to >>>>about 2. >>>> >>>>When the problem appears, there's no exception in logs, and no >>>>flush/compaction, nothing abnormal except a few warning logs sometimes >>>>like >>>>below: >>>>2011-12-27 15:50:20,307 WARN >>>>org.apache.hadoop.hbase.regionserver.wal.HLog: >>>>IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog; >>>>editcount=1, len~=9.8k >>>> >>>>Our cluster has 10 region servers, each with 25g heap size, 64% of >>>>which >>>>used for cache. The're some m/r jobs keep running in another cluster to >>>>feed data into the this hbase. Every night, we do flush and major >>>>compaction. Usually there's no flush or compaction in the daytime. >>>> >>>>Could anybody explain why the read speed could become lower after long >>>>running, and why it back to normal immediately after restarting hbase? >>>> >>>>Every advice will be highly appreciated. >>>> >>>>Thanks, >>>>Yi >>> >>> >> >> >> >
