Listing large directories via WebHDFS

Zhe Zhang Wed, 19 Oct 2016 14:08:42 -0700

Hi,

The regular HDFS client (DistributedFileSystem) throttles the workload of
listing large directories by dividing the work into batches, something like
below:
{code}
    // fetch the first batch of entries in the directory
    DirectoryListing thisListing = dfs.listPaths(
        src, HdfsFileStatus.EMPTY_NAME);
     ......
    if (!thisListing.hasMore()) { // got all entries of the directory
      FileStatus[] stats = new FileStatus[partialListing.length];
{code}


However, WebHDFS doesn't seem to have this batching logic.
{code}
  @Override
  public FileStatus[] listStatus(final Path f) throws IOException {
    final HttpOpParam.Op op = GetOpParam.Op.LISTSTATUS;
    return new FsPathResponseRunner<FileStatus[]>(op, f) {
      @Override
      FileStatus[] decodeResponse(Map<?,?> json) {
          ....
      }
    }.run();
  }
{code}

Am I missing anything? So a user can DDoS by {{hadoop fs -ls -R /}} via
WebHDFS?

Listing large directories via WebHDFS

Reply via email to