4 CPUs seems ok, unless you are running 2-3 MR tasks at the same time. So your value for the timeout is 240000, but did you change the tick time? The GC pause you got seemed to last almost a minute which, if you did not change the tick value, matches 3000*20 (disregard your session timeout).
J-D On Thu, Mar 25, 2010 at 1:07 AM, Zheng Lv <[email protected]> wrote: > Hello J-D, > Thank you for your reply first. > >How many CPUs do you have? > Every server has 2 Dual-Core cpus. > >Are you swapping? > Now I'm not sure about it with our monitor tools, but now we have written > a script to record vmstat log every 2 seconds. If something wrong happen > again, we can take it. > >Also if the only you are using this system currently to batch load > >data or as an analytics backend, you probably want to set the timeout > >higher: > But our value of this property is already 240000. > > We will try to optimize our garbage collector and we will see what will > happen. > Thanks again, J-D, > LvZheng > > 2010/3/25 Jean-Daniel Cryans <[email protected]> > >> 2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We >> slept 54963ms, ten times longer than scheduled: 3000 >> >> You had an important garbage collector pause (aka pause of the world >> in java-speak) and your region server's session with zookeeper expired >> (it literally stopped responding for too long, so long it was >> considered dead). Are you swapping? How many CPUs do you have? If you >> are slowing down the garbage collecting process, it will take more >> time. >> >> Also if the only you are using this system currently to batch load >> data or as an analytics backend, you probably want to set the timeout >> higher: >> >> <property> >> <name>zookeeper.session.timeout</name> >> <value>60000</value> >> <description>ZooKeeper session timeout. >> HBase passes this to the zk quorum as suggested maximum time for a >> session. See >> >> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions >> "The client sends a requested timeout, the server responds with the >> timeout that it can give the client. The current implementation >> requires that the timeout be a minimum of 2 times the tickTime >> (as set in the server configuration) and a maximum of 20 times >> the tickTime." Set the zk ticktime with >> hbase.zookeeper.property.tickTime. >> In milliseconds. >> </description> >> </property> >> >> This value can only be 20 times bigger than this: >> >> <property> >> <name>hbase.zookeeper.property.tickTime</name> >> <value>3000</value> >> <description>Property from ZooKeeper's config zoo.cfg. >> The number of milliseconds of each tick. See >> zookeeper.session.timeout description. >> </description> >> </property> >> >> >> So you could set tick to 6000, timeout to 120000 for a 2min timeout. >>
