[ https://issues.apache.org/jira/browse/HDFS-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453852#comment-15453852 ]
Andrew Wang commented on HDFS-10823: ------------------------------------ Thinking about it a little more, what we'd like is a generic way of implementing an iterator over any Filesystem. Basically there are two concerns: Fetching the next batch: * We know that HDFS returns results in sorted order, so the "startAfter" parameter serves as a cursor. * S3 only supports startAfter on the first request, and subsequent requests need to pass an opaque "continuation token" (http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html). Takeaway: use an opaque, FS-specific token. * POSIX readdir doesn't give any provision for "next batch", since it just gives a stream of results. In this case, maybe we don't try to batch and return the whole listing in one shot. Knowing when we're out of entries: * HDFS has a "remainingEntries" field returned in the DirectoryListing. * S3 has a boolean "IsTruncated" field that does something similar. * POSIX readdir makes you call it until it returns null. Again kind of annoying, since we need to issue one more call to know we're done. Ignoring POSIX, I think we can implement a generic class like HDFS's DirectoryListing with a cursor and a boolean that will work for at least two important FileSystems. I'll poke around with this. > Implement HttpFSFileSystem#listStatusIterator > --------------------------------------------- > > Key: HDFS-10823 > URL: https://issues.apache.org/jira/browse/HDFS-10823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs > Affects Versions: 2.6.4 > Reporter: Andrew Wang > Assignee: Andrew Wang > > Let's expose the same functionality added in HDFS-10784 for WebHDFS in HttpFS > too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org