I enabled logging. The slow map task was caused when making a socket connection call in setupIOstreams()(triggered by the first RPC call--getProtocolVersion()--from MapTask to TaskTracker). If the socket connection call was made at t1, the call didn't return until t1 + ~200 seconds (normally, each Map task takes about 8 seconds). At the RPC server side, doAccept() was also called at t1 + ~200 seconds. I was running a Job with 200+ splits 10 times. On average, there was one slow map task per run (all slow Map tasks took ~200 seconds to make the socket connection). I was using a recent 64-bit IBM JVM on SuSe.
Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 [EMAIL PROTECTED] (408)927-1886 (phone) (408)927-3215 (fax) Doug Cutting <[EMAIL PROTECTED]> 06/21/2007 09:21 AM Please respond to [email protected] To [email protected] cc Subject Re: map task in initializing phase for too long Jun Rao wrote: > I am wondering if anyone has experienced this problem. Sometimes when I > ran a job, a few map tasks (often just one) hang in the initializing phase > for more than 3 minutes (it normally finishes in a couple seconds). They > will eventually finish, but the whole job is slowed down considerably. The > weird thing is that the slow task is not deterministic. It doesn't always > occur and if does, can occur on any split and on any host. I have not seen this. Perhaps you can get a stack trace from the tasktracker while this is happening? Owen described how to get such stack traces in: http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/[EMAIL PROTECTED] Owen wrote: > One side note is that all of the servers have a servlet such that if > you do http://<node>:<port>/stacks you'll get a stack trace of all > the threads in the server. I find that useful for remote debugging. > *smile* Although if it is a task jvm that has the problem, then there > isn't a server for them. (This should probably be added to the documentation or the wiki...) Doug
