You bring up an interesting point. A big chunk of the code in the Namenode is being done inside a global lock although there are pieces (e.g. a portion of code that chooses datanodes for a newly allocated block) that do execute outside this lock. But, it is probably the case that the namenode does not benefit from more than 4 core or so (with the current code).
If you have 8 cores, you can expriement with running map-reduce jobs on the other 4 cores. How much memory does your machine have and how many files does your HDFS have? One possibility is that the memory pressure of the map-reduce jobs causes more GC runs for the namenode process. thanks, dhruba On Fri, May 9, 2008 at 7:54 PM, James Moore <[EMAIL PROTECTED]> wrote: > On Fri, May 9, 2008 at 12:00 PM, Hairong Kuang <[EMAIL PROTECTED]> wrote: > >> I'm using the machine running the namenode to run maps as well. > > Please do not run maps on the machine that is running the namenode. This > > would cause CPU contention and slow down namenode. Thus more easily to see > > SocketTimeoutException. > > > > Hairong > > I've turned off running tasks on the master, and I'm not seeing those errors. > > The behavior was interesting. On one job, I saw a total of 11 timeout > failures (where the map was reported as a failure), but all of them > happened in the first few minutes. After that it worked well and > completed correctly. > > I'm wondering if it's worth it, though. If the number of maps/reduces > that the master machine can run is substantially greater than the > number of failures due to timeouts, isn't it worth having the master > run tasks? It seems like there's probably a point where the number of > machines in the cluster makes having a separate master a requirement, > but at 20 8-core machines, it's not clear that dedicating a box to > being the master is a win. (And having a smaller machine dedicated to > being the master is cheaper, but annoying. I'd rather have N > identical boxes running the same AMI, etc.) > > To anyone using amazon - definitely upgrade to the new kernels. I now > have have very few instances of the 'Exception in > createBlockOutputStream' error that started this thread in my logs. > (These are different than the 11 timeouts I mentioned above, FYI). > > The ones that are there all happened in one burst at 03:59:22 this > afternoon: > > [EMAIL PROTECTED]:~/dev/hadoop$ bin/slaves.sh grep -r > 'Exception in createBlockOutputStream' ~/dev/hadoop/logs/ > domU-12-31-38-00-04-51.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000024_0/syslog:2008-05-09 > 03:59:22,713 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-D6-21.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000048_0/syslog:2008-05-09 > 03:59:22,989 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.22.111:50010 > domU-12-31-38-00-D6-21.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000061_0/syslog:2008-05-09 > 03:59:22,398 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-60-D1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000017_0/syslog:2008-05-09 > 03:59:22,880 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.217.203:50010 > domU-12-31-38-00-CD-41.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000051_0/syslog:2008-05-09 > 03:59:23,012 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.34.31:50010 > domU-12-31-38-00-D5-E1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000026_0/syslog:2008-05-09 > 03:59:24,551 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.15.47:50010 > domU-12-31-38-00-1D-D1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000056_0/syslog:2008-05-09 > 03:59:23,504 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.11.159:50010 > domU-12-31-38-00-1D-D1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000050_0/syslog:2008-05-09 > 03:59:22,454 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-1D-D1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000009_0/syslog:2008-05-09 > 03:59:22,944 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-D8-81.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000002_0/syslog:2008-05-09 > 03:59:22,420 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-D8-81.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000072_0/syslog:2008-05-09 > 03:59:22,318 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > domU-12-31-38-00-08-C1.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000021_0/syslog:2008-05-09 > 03:59:24,150 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.22.111:50010 > domU-12-31-38-00-C9-51.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000045_0/syslog:2008-05-09 > 03:59:24,470 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.252.22.111:50010 > domU-12-31-38-00-C9-51.compute-1.internal: > > /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000055_0/syslog:2008-05-09 > 03:59:21,588 INFO org.apache.hadoop.dfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > > > -- > James Moore | [EMAIL PROTECTED] > blog.restphone.com >