[
https://issues.apache.org/jira/browse/HDFS-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453852#comment-15453852
]
Andrew Wang commented on HDFS-10823:
------------------------------------
Thinking about it a little more, what we'd like is a generic way of
implementing an iterator over any Filesystem. Basically there are two concerns:
Fetching the next batch:
* We know that HDFS returns results in sorted order, so the "startAfter"
parameter serves as a cursor.
* S3 only supports startAfter on the first request, and subsequent requests
need to pass an opaque "continuation token"
(http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html).
Takeaway: use an opaque, FS-specific token.
* POSIX readdir doesn't give any provision for "next batch", since it just
gives a stream of results. In this case, maybe we don't try to batch and return
the whole listing in one shot.
Knowing when we're out of entries:
* HDFS has a "remainingEntries" field returned in the DirectoryListing.
* S3 has a boolean "IsTruncated" field that does something similar.
* POSIX readdir makes you call it until it returns null. Again kind of
annoying, since we need to issue one more call to know we're done.
Ignoring POSIX, I think we can implement a generic class like HDFS's
DirectoryListing with a cursor and a boolean that will work for at least two
important FileSystems. I'll poke around with this.
> Implement HttpFSFileSystem#listStatusIterator
> ---------------------------------------------
>
> Key: HDFS-10823
> URL: https://issues.apache.org/jira/browse/HDFS-10823
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: httpfs
> Affects Versions: 2.6.4
> Reporter: Andrew Wang
> Assignee: Andrew Wang
>
> Let's expose the same functionality added in HDFS-10784 for WebHDFS in HttpFS
> too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]