Dave,

Can you pastebin the exact error that was returned by the MR job? That
looks like it's client-side (from HBase point of view).

WRT the .META. and the master, the web page does do a request on every
hit so if the region is unavailable then you can't see it. Looks like
you kill -9'ed the region server? If so, it takes a minute to detect
the region server failure and then split the write-ahead-logs so if
.META. was on that machine, it will take that much time to have a
working web page.

Instead of kill -9, simply go on the node and run
./bin/hbase-daemon.sh stop regionserver

J-D

On Wed, Mar 31, 2010 at 5:51 PM, Buttler, David <buttl...@llnl.gov> wrote:
> Hi,
> I have a small cluster (6 nodes, 1 master and 5 region server/data nodes).  
> Each node has lots of memory and disk (16GB of heap dedicated to 
> RegionServers), 4 TB of disk per node for hdfs.
> I have a table with about 1 million rows in hbase - that's all.  Currently it 
> is split across 50 regions.
> I was monitoring this with the hbase web gui and I noticed that a lot of the 
> heap was being used (14GB).  I was running a MR job and I was getting an 
> error to the console that launched the job:
> Error: GC overhead limit exceeded hbase
>
> First question: is this going to hose the whole system?  I didn't see the 
> error in any of the hbase logs, so I assume that it was purely a client issue.
>
> So, naively thinking that maybe the GC had moved everything to permgen and 
> just wasn't cleaning up, I thought I would do a rolling restart of my region 
> servers and see if that cleared everything up.  The first server I killed 
> happened to be the one that was hosting the .META. table.  Subsequently the 
> web gui failed.  Looking at the errors, it seems that the web gui essentially 
> caches the address for the meta table and blindly tries connecting on every 
> request.  I suppose I could restart the master, but this does not seem like 
> desirable behavior.  Shouldn't the cache be refreshed on error?  And since 
> there is no real code for the GUI, just a jsp page, doesn't this mean that 
> this behavior could be seen in other applications that use HMaster?
>
> Corrections welcome
> Dave
>
>

Reply via email to