[ 
https://issues.apache.org/jira/browse/HDFS-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453852#comment-15453852
 ] 

Andrew Wang commented on HDFS-10823:
------------------------------------

Thinking about it a little more, what we'd like is a generic way of 
implementing an iterator over any Filesystem. Basically there are two concerns:

Fetching the next batch:

* We know that HDFS returns results in sorted order, so the "startAfter" 
parameter serves as a cursor.
* S3 only supports startAfter on the first request, and subsequent requests 
need to pass an opaque "continuation token" 
(http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html). 
Takeaway: use an opaque, FS-specific token.
* POSIX readdir doesn't give any provision for "next batch", since it just 
gives a stream of results. In this case, maybe we don't try to batch and return 
the whole listing in one shot.

Knowing when we're out of entries:

* HDFS has a "remainingEntries" field returned in the DirectoryListing.
* S3 has a boolean "IsTruncated" field that does something similar.
* POSIX readdir makes you call it until it returns null. Again kind of 
annoying, since we need to issue one more call to know we're done.

Ignoring POSIX, I think we can implement a generic class like HDFS's 
DirectoryListing with a cursor and a boolean that will work for at least two 
important FileSystems. I'll poke around with this.

> Implement HttpFSFileSystem#listStatusIterator
> ---------------------------------------------
>
>                 Key: HDFS-10823
>                 URL: https://issues.apache.org/jira/browse/HDFS-10823
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: httpfs
>    Affects Versions: 2.6.4
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>
> Let's expose the same functionality added in HDFS-10784 for WebHDFS in HttpFS 
> too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to