[ https://issues.apache.org/jira/browse/HADOOP-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547224 ]
dhruba borthakur commented on HADOOP-2158: ------------------------------------------ Good patch. Code looks good. + 1. II think the listPaths API on a directory used to return the size of the entire directory subtree. However, the listStatus API on a directory does not do so. If your application is not replying the original behaviour of listPaths then this change makes sense. > hdfsListDirectory in libhdfs does not scale > ------------------------------------------- > > Key: HADOOP-2158 > URL: https://issues.apache.org/jira/browse/HADOOP-2158 > Project: Hadoop > Issue Type: Bug > Components: libhdfs > Affects Versions: 0.15.0 > Reporter: Christian Kunz > Assignee: Christian Kunz > Priority: Blocker > Fix For: 0.15.2 > > Attachments: 2158.patch > > > hdfsListDirectory makes one rpc call using deprecated > fs.FileSystem.listPaths, and then two rpc calls for every entry in the > returned array. When running a job with more than 3000 mappers each running a > pipes application using libhdfs to scan a dfs directory with about 100-200 > entries, this results in about 1M rpc calls to the namenode server > overwhelming it. > hdfsListDirectory should call fs.FileSystem.listStatus instead. > I will submit a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.