Ted, It's hard to tell without having full logs, but it seems that this is caught in a reassignment loop. I see things getting unassigned and reassigned over and over.
Are there any weird network or dns issues on this cluster? It shows avg load at 7 not 13, so seems that there could be some extra regionserver or something else wonky. Have you tried restarting your cluster? > -----Original Message----- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: Saturday, March 13, 2010 1:59 PM > To: hbase-user@hadoop.apache.org > Subject: Re: slow response in hbase shell > > Here is region server log: > http://pastebin.com/6d1QgNXR > > There was only one exception around the time I performed get: > 2010-03-13 03:38:30,325 INFO [regionserver/10.10.31.137:60020.worker] > regionserver.HRegion(342): region > domaincrawltable,com.intensedebate.s:http\x2Fsignup,1268098564908/60229 > 3480 > available; sequence id is 225656738 > 2010-03-13 03:38:30,476 ERROR [IPC Server handler 23 on 60020] > regionserver.HRegionServer(844): > org.apache.hadoop.hbase.NotServingRegionException: > ruletable,com.hoovers.www/companyindex/New_Mexico/Corrales/Telecommunic > ations-1.html,1268084284374 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer > ver.java:2307) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja > va:1784) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > rImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91 > 5) > > Here is master server log which doesn't have exception at all: > http://pastebin.com/80949RK2 > > On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray <jl...@streamy.com> > wrote: > > > Ted, > > > > > > > > Your attachments didn't come through. Try putting them up on the web > or > > pastebin somewhere. > > > > > > > > What's happening in the RegionServer logs between the time that the > get > > works and the get doesn't work? > > > > > > > > Also, I recommend upgrading to 0.20.3, there are critical fixes. > > > > > > > > From: Ted Yu [mailto:yuzhih...@gmail.com] > > Sent: Saturday, March 13, 2010 3:48 AM > > To: hbase-user@hadoop.apache.org > > Subject: Re: slow response in hbase shell > > > > > > > > We use hbase 0.20.1 > > There are 3 region servers. 13 regions on each server. > > > > I don't see any exception in master log. > > > > I was able to run 10 successful get commands before hitting the > following: > > I am attaching (partial) master log and region server log from > 10.10.31.137 > > > > hbase(main):014:0> get 'ruletable', 'com.about.acne' > > COLUMN CELL > > lpm_1.0:category timestamp=1268347483823, > > value=http://acne.about.com\t9002:0.86580086\thost\n > > 1 row(s) in 0.0120 seconds > > hbase(main):015:0> get 'ruletable', 'com.about.acne' > > NativeException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: > > Trying to contact region server 10.10.31.137:60020 for region > > ruletable,,1268083966723, row 'com.about.acne', but failed after 5 > > attempts. > > Exceptions: > > org.apache.hadoop.hbase.NotServingRegionException: > > org.apache.hadoop.hbase.NotServingRegionException: > ruletable,,1268083966723 > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer > ver.j > > ava:2307) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja > va:17 > > 84) > > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > rImpl > > .java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > > at > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91 > 5) > > > > > > > > On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray <jl...@streamy.com> > wrote: > > > > Seems like something weird is going on with your regionservers and > > balancing. > > > > Can you post big snippets from the regionserver and master logs? > Have you > > checked to see what's going on? Is there repetitive balancing going > on > > that > > never seems to reach steady-state? > > > > How many regions and how many nodes on which version of HBase? > > > > > > > -----Original Message----- > > > From: Ted Yu [mailto:yuzhih...@gmail.com] > > > Sent: Friday, March 12, 2010 8:24 PM > > > To: hbase-user@hadoop.apache.org > > > Subject: slow response in hbase shell > > > > > > Hi, > > > > > > > We sometimes saw over 5 second delay running get in hbase 0.20.1 > > > shell: > > > > > > > > hbase(main):002:0> get 'ruletable', 'ca.tsn.www' > > > > 0 row(s) in 10.1330 seconds > > > > > > > > From our 3 region servers there are a lot of such messages: > > > > > > > > > > > > > > 2010-03-12 00:00:00,996 INFO [regionserver/10.10.31.135:60020] > > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE: > > > > > > > > crawltable,com.pandora.www:http\x2Finclude\x2FlyricsAdEmbed.html\x3Fgen > > > > re\x3Delectronica\x26artist\x3DR273847\x26webname\x3D\x26sz\x3D2000x8\x > > > > 26ord\x3D125823029226371645\x26tile\x3D3\x26_id\x3Dbottom_leaderboard_c > > > ontainer,1266944566406: > > > > Overloaded > > > > 2010-03-12 00:00:00,997 INFO [regionserver/10.10.31.135:60020] > > > > regionserver.HRegionServer(493): MSG_REGION_CLOSE: > > > > domaincrawltable,,1268098564908: Overloaded > > > > > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > > > 40428 343638 11523230 > > > > > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > > > 40430 343655 11307703 > > > > > > > > grep Overloaded hbase-hbaseadmin-regionserver.log | wc > > > > 40466 343961 11340379 > > > > > > > > Was the slow response due to the load balancing ? > > > > > > > > > > The strange thing was that after several quick responses I would > see: > > > > > > hbase(main):004:0> get 'ruletable', 'com.about.acne' > > > COLUMN > > > CELL > > > lpm_1.0:category timestamp=1268347483823, value= > > > http://acne.about.com\t9002:0.86580 086\thost\n > > > 1 row(s) in 0.0040 seconds > > > hbase(main):005:0> get 'ruletable', 'com.about.acne' > > > COLUMN > > > CELL > > > lpm_1.0:category timestamp=1268347483823, value= > > > http://acne.about.com\t9002:0.86580 086\thost\n > > > 1 row(s) in 0.0040 seconds > > > hbase(main):006:0> get 'ruletable', 'com.about.acne' > > > NativeException: > > > org.apache.hadoop.hbase.client.RetriesExhaustedException: > > > Trying to contact region server 10.10.31.136:60020 for region > > > ruletable,,1268083966723, row 'com.about.acne', but failed after 5 > > > attempts. > > > Exceptions: > > > org.apache.hadoop.hbase.NotServingRegionException: > > > org.apache.hadoop.hbase.NotServingRegionException: > > > ruletable,,1268083966723 > > > at > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer > > > ver.java:2307) > > > at > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.ja > > > va:1784) > > > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown > Source) > > > at > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > > > rImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > > > at > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:91 > > > 5) > > > > > > But 006:60010/master.jsp refreshes quickly and shows all three > > > regionservers. > > > Don't know why hbase shell encountered NSRE. > > > > > > > > > > > Thanks > > > > > > > >