Hello J-D,
>4 CPUs seems ok, unless you are running 2-3 MR tasks at the same time.
I think it never happened that we are running 3 mr tasks at the same time
in one server, maybe sometimes 2, but not 3. And with our monitor tools, the
cpu is always not busy.
I didn't change the tick value, and I will do it right now. But I wanna
know why the timeout value can only be 20 times bigger than ticktime, can
you tell me?
Thank you,
Regards,
LvZheng
2010/3/26 Jean-Daniel Cryans <[email protected]>
> 4 CPUs seems ok, unless you are running 2-3 MR tasks at the same time.
>
> So your value for the timeout is 240000, but did you change the tick
> time? The GC pause you got seemed to last almost a minute which, if
> you did not change the tick value, matches 3000*20 (disregard your
> session timeout).
>
> J-D
>
> On Thu, Mar 25, 2010 at 1:07 AM, Zheng Lv <[email protected]>
> wrote:
> > Hello J-D,
> > Thank you for your reply first.
> > >How many CPUs do you have?
> > Every server has 2 Dual-Core cpus.
> > >Are you swapping?
> > Now I'm not sure about it with our monitor tools, but now we have
> written
> > a script to record vmstat log every 2 seconds. If something wrong happen
> > again, we can take it.
> > >Also if the only you are using this system currently to batch load
> > >data or as an analytics backend, you probably want to set the timeout
> > >higher:
> > But our value of this property is already 240000.
> >
> > We will try to optimize our garbage collector and we will see what will
> > happen.
> > Thanks again, J-D,
> > LvZheng
> >
> > 2010/3/25 Jean-Daniel Cryans <[email protected]>
> >
> >> 2010-03-24 11:33:52,331 WARN org.apache.hadoop.hbase.util.Sleeper: We
> >> slept 54963ms, ten times longer than scheduled: 3000
> >>
> >> You had an important garbage collector pause (aka pause of the world
> >> in java-speak) and your region server's session with zookeeper expired
> >> (it literally stopped responding for too long, so long it was
> >> considered dead). Are you swapping? How many CPUs do you have? If you
> >> are slowing down the garbage collecting process, it will take more
> >> time.
> >>
> >> Also if the only you are using this system currently to batch load
> >> data or as an analytics backend, you probably want to set the timeout
> >> higher:
> >>
> >> <property>
> >> <name>zookeeper.session.timeout</name>
> >> <value>60000</value>
> >> <description>ZooKeeper session timeout.
> >> HBase passes this to the zk quorum as suggested maximum time for a
> >> session. See
> >>
> >>
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
> >> "The client sends a requested timeout, the server responds with the
> >> timeout that it can give the client. The current implementation
> >> requires that the timeout be a minimum of 2 times the tickTime
> >> (as set in the server configuration) and a maximum of 20 times
> >> the tickTime." Set the zk ticktime with
> >> hbase.zookeeper.property.tickTime.
> >> In milliseconds.
> >> </description>
> >> </property>
> >>
> >> This value can only be 20 times bigger than this:
> >>
> >> <property>
> >> <name>hbase.zookeeper.property.tickTime</name>
> >> <value>3000</value>
> >> <description>Property from ZooKeeper's config zoo.cfg.
> >> The number of milliseconds of each tick. See
> >> zookeeper.session.timeout description.
> >> </description>
> >> </property>
> >>
> >>
> >> So you could set tick to 6000, timeout to 120000 for a 2min timeout.
> >>
>