hdfsListDirectory in libhdfs does not scale -------------------------------------------
Key: HADOOP-2158 URL: https://issues.apache.org/jira/browse/HADOOP-2158 Project: Hadoop Issue Type: Bug Components: libhdfs Affects Versions: 0.15.0 Reporter: Christian Kunz Priority: Blocker hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths, and then two rpc calls for every entry in the returned array. When running a job with more than 3000 mappers each running a pipes application using libhdfs to scan a dfs directory with about 100-200 entries, this results in about 1M rpc calls to the namenode server overwhelming it. hdfsListDirectory should call fs.FileSystem.listStatus instead. I will submit a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.