Please refer another mail thread: "RegionServer failure and recovery take a long time". J-D has given me many good advices.
1. set low value to "swappiness" to avoid too many swap. for example 20 or 10. 2. use some jvm GC options, I am using -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode, and would and -XX:ParallelGCThreads=8 later. 2. use a patch from J-D: https://issues.apache.org/jira/browse/HBASE-1008 Schubert On Sat, May 16, 2009 at 5:16 AM, stack <[email protected]> wrote: > Andy's base point is that you've probably overloaded your setup. What are > you hoping to achieve with this setup of one machine? > > You've followed the 'Getting Started' section in hbase documentation? Has > some configuration you need. Enable the troubleshooting suggested > configurations too if you want to remove lack of resources or incorrect > timeouts as cause. You should enable DEBUG too. Will make your logs > richer > in detail and will help with the diagnosis. > > Thanks, > St.Ack > > > On Fri, May 15, 2009 at 11:58 AM, Sasha Dolgy <[email protected]> wrote: > > > Hi Andy, > > I've sent you an email with a link to a tar file with the logs. To be > > honest, for the most part this is default out of the box. To this point > > this is the first problem with over 150k writes to HBase. After i > stopped > > / > > started HBase again everything is going fine... > > > > I haven't looked at the troubleshooting page yet, because well, i'm not > > quite sure what to trouble shoot. Finding it hard to identify an actual > > problem....other then seeing stack traces and it not working. > > > > -sd > > > > On Fri, May 15, 2009 at 7:54 PM, Andrew Purtell <[email protected]> > > wrote: > > > > > This is almost surely resource overcommitment as cause: CPU and/or > > memory, > > > leading to thread starvation. We observe the JVM scheduler is unfair at > > high > > > load, and swap, especially if JVM heap is paged out when a GC cycle > > happens, > > > can also be similarly deadly. Give other details in this thread, I > > suspect > > > swap. What JVM options are you running with? Have you looked at the GC > > > related tips on the troubleshooting page up on the wiki? > > > http://wiki.apache.org/hadoop/Hbase/Troubleshooting > > > > > > Best regards, > > > > > > - Andy > > > > > > > > > > > > > > > ________________________________ > > > From: Sasha Dolgy <[email protected]> > > > To: [email protected] > > > Sent: Friday, May 15, 2009 11:38:01 AM > > > Subject: Re: HRegionServer: Failed openScanner > > > > > > In the region server logs I see messages from the 14th: > > > 2009-05-14 22:47:28,840 INFO > > org.apache.hadoop.hbase.regionserver.HRegion: > > > starting compaction on region syslog,,1242260881586 > > > 2009-05-14 22:47:43,976 INFO > > org.apache.hadoop.hbase.regionserver.HRegion: > > > compaction completed on region syslog,,1242260881586 in 15sec > > > > > > then no log entries until the 15th when the error happens: > > > > > > 2009-05-15 00:55:51,568 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept > > > 189138ms, ten times longer than scheduled: 10000 > > > 2009-05-15 00:55:52,334 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept > > > 188348ms, ten times longer than scheduled: 3000 > > > 2009-05-15 00:55:53,090 WARN > > > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to > > > master for 189261 milliseconds - retrying > > > 2009-05-15 00:55:56,789 INFO > > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > > MSG_CALL_SERVER_STARTUP: > > > safeMode=false > > > 2009-05-15 00:55:57,249 ERROR > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner > > > org.apache.hadoop.hbase.NotServingRegionException: .META.,,1 > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2076) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1710) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) > > > at > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912) > > > > > > > > > > > > On Fri, May 15, 2009 at 7:32 PM, Sasha Dolgy <[email protected]> wrote: > > > > > > > Ok, i'll go take a look. They are both on the local server so > network > > > > issues shouldn't be a cause. Cheers though, i'll go look at the JIRA > > > link. > > > > If I find anything else i'll post here. > > > > thanks > > > > -sd > > > > > > > > On Fri, May 15, 2009 at 6:18 PM, Andrew Purtell <[email protected] > > > >wrote: > > > > > > > >> The region server hosting META could not communicate with the master > > for > > > a > > > >> very long time. Some kind of network issue? Any entries in the > region > > > server > > > >> logs above this one > > > >> > > > >> > 2009-05-15 00:55:53,090 WARN > > > >> > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to > report > > > to > > > >> > master for 189261 milliseconds - retrying > > > >> > > > >> which may be relevant? Anything about sleeping too long? > > > >> > > > >> Related, there were some bugs that I am aware of preventing recovery > > if > > > >> META in particular goes away but they should be fixed for 0.20 as of > > > >> https://issues.apache.org/jira/browse/HBASE-1362 . > > > >> > > > >> - Andy > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Sasha Dolgy > > [email protected] > > >
