Stephen O'Donnell created HDFS-14663:
----------------------------------------
Summary: HTTPFS ListStatus_Batch does not return batches as
expected
Key: HDFS-14663
URL: https://issues.apache.org/jira/browse/HDFS-14663
Project: Hadoop HDFS
Issue Type: Bug
Components: httpfs
Affects Versions: 3.3.0
Reporter: Stephen O'Donnell
The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can
retrieve the file listing for a large directory in chunks.
When using the webhdfs service embedded in the namenode, this works as
expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns
the entire listing rather than batches, working effectively like LISTSTATUS
instead.
This seems to be because HTTPFS falls back to using the method
org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be
overridden, but the implementation used in HTTPFS has not done that, leading to
this limitation.
This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based on
my testing it does not work as intended. I suspect it is because the
listStatusBatch operation was added to the WebHdfsFileSystem and
HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS seems
to use DistributeFileSystem and hence it falls back to the default
implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which returns
all entries in a single batch.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]