Brian Bockelman wrote:
On May 14, 2010, at 8:27 PM, Todd Lipcon wrote:

Hey Brian,

Yep, excessive GC definitely sounds like a likely culprit. I'm surprised you
didn't see OOMEs in the log, though.


We didn't until the third restart today.  I have no clue why we haven't seen 
this in the past 9 months of this cluster though...

Anyhow, it looks like this might have done the trick... the sysadmin is heading 
over to kick over a few errant datanodes, and we should be able to get out of 
safemode soon.  Luckily, it's a 4-day weekend in Europe and otherwise a Friday 
evening in the US, so there's only a few folks using it.

good thing we europeans have long weekends.


If you want to monitor GC, I'd recommend adding -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps to your java options -
occasionally useful for times like this.


What are your current GC options? Played with compressed object pointers yet?

Reply via email to