[ 
https://issues.apache.org/jira/browse/HADOOP-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540366
 ] 

Devaraj Das commented on HADOOP-2158:
-------------------------------------

So if the namenode is taking a long time to respond, and if the task is not  
doing statusUpdates in the interim, this could result in tasks timing out at 
the tasktracker on statusUpdates and getting killed. Could this be connected to 
HADOOP-2076 in some way?

> hdfsListDirectory in libhdfs does not scale
> -------------------------------------------
>
>                 Key: HADOOP-2158
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2158
>             Project: Hadoop
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Priority: Blocker
>         Attachments: 2158.patch
>
>
> hdfsListDirectory makes one rpc call using deprecated 
> fs.FileSystem.listPaths, and then two rpc calls for every entry in the 
> returned array. When running a job with more than 3000 mappers each running a 
> pipes application using libhdfs to scan a dfs directory with about 100-200 
> entries, this results in about 1M rpc calls to the namenode server 
> overwhelming it.
> hdfsListDirectory should call fs.FileSystem.listStatus instead.
> I will submit a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to