[ 
https://issues.apache.org/jira/browse/HDFS-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903880#comment-16903880
 ] 

Siyao Meng edited comment on HDFS-14663 at 8/9/19 1:26 PM:
-----------------------------------------------------------

I went back to the commit when HDFS-10823 was committed (~3 yrs ago), compiled 
and ran a hacked test. It turns out *WrappedFileSystem#listStatusBatch()* still 
calls *FileSystem#listStatus()*. At this point, it is proved that it's not some 
other code change that breaks it. And I don't think it would be JDK's fault. - 
We need to figure out another plan for this to work.

As Andrew commented in HDFS-10823, the WrappedFileSystem was sort of a clever 
hack. But unfortunately it doesn't quite work.


was (Author: smeng):
I went back to the extra commit when HDFS-10823 was committed (~3 yrs ago), 
compiled and ran a hacked test. It turns out 
*WrappedFileSystem#listStatusBatch()* still calls *FileSystem#listStatus()*. At 
this point, it is proved that it's not some other code change that breaks it. 
And I don't think it would be JDK's fault. - We need to figure out another plan 
for this to work.

> HttpFS: LISTSTATUS_BATCH does not return batches
> ------------------------------------------------
>
>                 Key: HDFS-14663
>                 URL: https://issues.apache.org/jira/browse/HDFS-14663
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: httpfs
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Siyao Meng
>            Priority: Major
>
> The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can 
> retrieve the file listing for a large directory in chunks.
> When using the webhdfs service embedded in the namenode, this works as 
> expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns 
> the entire listing rather than batches, working effectively like LISTSTATUS 
> instead.
> This seems to be because HTTPFS falls back to using the method 
> org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be 
> overridden, but the implementation used in HTTPFS has not done that, leading 
> to this limitation.
> This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based 
> on my testing it does not work as intended. I suspect it is because the 
> listStatusBatch operation was added to the WebHdfsFileSystem and 
> HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS 
> seems to use DistributeFileSystem and hence it falls back to the default 
> implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which 
> returns all entries in a single batch.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to