Please refer another mail thread: "RegionServer failure and recovery take a
long time". J-D has given me many good advices.

1. set low value to "swappiness" to avoid too many swap. for example 20 or
10.
2. use some jvm GC options, I am using -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode,
and would and -XX:ParallelGCThreads=8 later.
2. use a patch from J-D: https://issues.apache.org/jira/browse/HBASE-1008

Schubert



On Sat, May 16, 2009 at 5:16 AM, stack <[email protected]> wrote:

> Andy's base point is that you've probably overloaded your setup.  What are
> you hoping to achieve with this setup of one machine?
>
> You've followed the 'Getting Started' section in hbase documentation?  Has
> some configuration you need.  Enable the troubleshooting suggested
> configurations too if you want to remove lack of resources or incorrect
> timeouts as cause.  You should enable DEBUG too.  Will make your logs
> richer
> in detail and will help with the diagnosis.
>
> Thanks,
> St.Ack
>
>
> On Fri, May 15, 2009 at 11:58 AM, Sasha Dolgy <[email protected]> wrote:
>
> > Hi Andy,
> > I've sent you an email with a link to a tar file with the logs.  To be
> > honest, for the most part this is default out of the box.  To this point
> > this is the first problem with over 150k writes to HBase.  After i
> stopped
> > /
> > started HBase again everything is going fine...
> >
> > I haven't looked at the troubleshooting page yet, because well, i'm not
> > quite sure what to trouble shoot.  Finding it hard to identify an actual
> > problem....other then seeing stack traces and it not working.
> >
> > -sd
> >
> > On Fri, May 15, 2009 at 7:54 PM, Andrew Purtell <[email protected]>
> > wrote:
> >
> > > This is almost surely resource overcommitment as cause: CPU and/or
> > memory,
> > > leading to thread starvation. We observe the JVM scheduler is unfair at
> > high
> > > load, and swap, especially if JVM heap is paged out when a GC cycle
> > happens,
> > > can also be similarly deadly. Give other details in this thread, I
> > suspect
> > > swap. What JVM options are you running with? Have you looked at the GC
> > > related tips on the troubleshooting page up on the wiki?
> > > http://wiki.apache.org/hadoop/Hbase/Troubleshooting
> > >
> > > Best regards,
> > >
> > >   - Andy
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Sasha Dolgy <[email protected]>
> > > To: [email protected]
> > > Sent: Friday, May 15, 2009 11:38:01 AM
> > > Subject: Re: HRegionServer: Failed openScanner
> > >
> > > In the region server logs I see messages from the 14th:
> > > 2009-05-14 22:47:28,840 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > starting  compaction on region syslog,,1242260881586
> > > 2009-05-14 22:47:43,976 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > compaction completed on region syslog,,1242260881586 in 15sec
> > >
> > > then no log entries until the 15th when the error happens:
> > >
> > > 2009-05-15 00:55:51,568 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept
> > > 189138ms, ten times longer than scheduled: 10000
> > > 2009-05-15 00:55:52,334 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept
> > > 188348ms, ten times longer than scheduled: 3000
> > > 2009-05-15 00:55:53,090 WARN
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
> > > master for 189261 milliseconds - retrying
> > > 2009-05-15 00:55:56,789 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > MSG_CALL_SERVER_STARTUP:
> > > safeMode=false
> > > 2009-05-15 00:55:57,249 ERROR
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
> > > org.apache.hadoop.hbase.NotServingRegionException: .META.,,1
> > >        at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2076)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1710)
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >        at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >        at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >        at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > >        at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
> > >
> > >
> > >
> > > On Fri, May 15, 2009 at 7:32 PM, Sasha Dolgy <[email protected]> wrote:
> > >
> > > > Ok, i'll go take a look.  They are both on the local server so
> network
> > > > issues shouldn't be a cause.  Cheers though, i'll go look at the JIRA
> > > link.
> > > > If I find anything else i'll post here.
> > > > thanks
> > > > -sd
> > > >
> > > > On Fri, May 15, 2009 at 6:18 PM, Andrew Purtell <[email protected]
> > > >wrote:
> > > >
> > > >> The region server hosting META could not communicate with the master
> > for
> > > a
> > > >> very long time. Some kind of network issue? Any entries in the
> region
> > > server
> > > >> logs above this one
> > > >>
> > > >> > 2009-05-15 00:55:53,090 WARN
> > > >> > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to
> report
> > > to
> > > >> > master for 189261 milliseconds - retrying
> > > >>
> > > >> which may be relevant? Anything about sleeping too long?
> > > >>
> > > >> Related, there were some bugs that I am aware of preventing recovery
> > if
> > > >> META in particular goes away but they should be fixed for 0.20 as of
> > > >> https://issues.apache.org/jira/browse/HBASE-1362 .
> > > >>
> > > >>   - Andy
> > > >>
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Sasha Dolgy
> > [email protected]
> >
>

Reply via email to