[jira] Commented: (HADOOP-1182) DFS Scalability issue with filecache in large clusters

dhruba borthakur (JIRA) Fri, 30 Mar 2007 09:32:46 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485576
 ]


dhruba borthakur commented on HADOOP-1182:
------------------------------------------

There are a few open issues that deal with reducing CPU usage on namenode. 
Hadoop-1155, hadoop-1149, hadoop-1079 and hadoop-1073. Some of these patches 
should improve your situation.

In the short-term,you could try increasing the number of Namenode threads. 
This, in turn, increases the call queue depth (100 calls per each additional 
server threads). The default number of server threads is 40. To make this 
change, you have to add the following to hadoop-site.xml:

 <property>
  <name>dfs.namenode.handler.count</name>
  <value>40</value>
  <description>The number of server threads for the namenode.</description>
</property>


> DFS Scalability issue with filecache in large clusters
> ------------------------------------------------------
>
>                 Key: HADOOP-1182
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1182
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Christian Kunz
>
> When using filecache to distribute supporting files for map/reduce 
> applications in a 1000 node cluster, many map tasks fail  because of 
> timeouts. There was no such problem using a 200 node cluster for the same 
> applications with comparable input data. Either the whole job fails because 
> of too many map failures, or even worse, some map tasks hang indefinitely.
> java.net.SocketTimeoutException: timed out waiting for rpc response
>       at org.apache.hadoop.ipc.Client.call(Client.java:473)
>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
>       at org.apache.hadoop.dfs.$Proxy1.exists(Unknown Source)
>       at org.apache.hadoop.dfs.DFSClient.exists(DFSClient.java:320)
>       at 
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.exists(DistributedFileSystem.java:170)
>       at 
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:125)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
>       at 
> org.apache.hadoop.filecache.DistributedCache.createMD5(DistributedCache.java:327)
>       at 
> org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:253)
>       at 
> org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:169)
>       at 
> org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:86)
>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:117)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1182) DFS Scalability issue with filecache in large clusters

Reply via email to