Stephen O'Donnell created HDFS-14663: ----------------------------------------
Summary: HTTPFS ListStatus_Batch does not return batches as expected Key: HDFS-14663 URL: https://issues.apache.org/jira/browse/HDFS-14663 Project: Hadoop HDFS Issue Type: Bug Components: httpfs Affects Versions: 3.3.0 Reporter: Stephen O'Donnell The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can retrieve the file listing for a large directory in chunks. When using the webhdfs service embedded in the namenode, this works as expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns the entire listing rather than batches, working effectively like LISTSTATUS instead. This seems to be because HTTPFS falls back to using the method org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be overridden, but the implementation used in HTTPFS has not done that, leading to this limitation. This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based on my testing it does not work as intended. I suspect it is because the listStatusBatch operation was added to the WebHdfsFileSystem and HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS seems to use DistributeFileSystem and hence it falls back to the default implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which returns all entries in a single batch. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org