hdfsListDirectory in libhdfs does not scale
-------------------------------------------
Key: HADOOP-2158
URL: https://issues.apache.org/jira/browse/HADOOP-2158
Project: Hadoop
Issue Type: Bug
Components: libhdfs
Affects Versions: 0.15.0
Reporter: Christian Kunz
Priority: Blocker
hdfsListDirectory makes one rpc call using deprecated fs.FileSystem.listPaths,
and then two rpc calls for every entry in the returned array. When running a
job with more than 3000 mappers each running a pipes application using libhdfs
to scan a dfs directory with about 100-200 entries, this results in about 1M
rpc calls to the namenode server overwhelming it.
hdfsListDirectory should call fs.FileSystem.listStatus instead.
I will submit a patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.