[
https://issues.apache.org/jira/browse/HADOOP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485576
]
dhruba borthakur commented on HADOOP-1182:
------------------------------------------
There are a few open issues that deal with reducing CPU usage on namenode.
Hadoop-1155, hadoop-1149, hadoop-1079 and hadoop-1073. Some of these patches
should improve your situation.
In the short-term,you could try increasing the number of Namenode threads.
This, in turn, increases the call queue depth (100 calls per each additional
server threads). The default number of server threads is 40. To make this
change, you have to add the following to hadoop-site.xml:
<property>
<name>dfs.namenode.handler.count</name>
<value>40</value>
<description>The number of server threads for the namenode.</description>
</property>
> DFS Scalability issue with filecache in large clusters
> ------------------------------------------------------
>
> Key: HADOOP-1182
> URL: https://issues.apache.org/jira/browse/HADOOP-1182
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.12.1
> Reporter: Christian Kunz
>
> When using filecache to distribute supporting files for map/reduce
> applications in a 1000 node cluster, many map tasks fail because of
> timeouts. There was no such problem using a 200 node cluster for the same
> applications with comparable input data. Either the whole job fails because
> of too many map failures, or even worse, some map tasks hang indefinitely.
> java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:473)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
> at org.apache.hadoop.dfs.$Proxy1.exists(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient.exists(DFSClient.java:320)
> at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.exists(DistributedFileSystem.java:170)
> at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:125)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
> at
> org.apache.hadoop.filecache.DistributedCache.createMD5(DistributedCache.java:327)
> at
> org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:253)
> at
> org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:169)
> at
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:86)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:117)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.