[ 
https://issues.apache.org/jira/browse/HADOOP-17242?focusedWorklogId=479951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-479951
 ]

ASF GitHub Bot logged work on HADOOP-17242:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Sep/20 09:07
            Start Date: 08/Sep/20 09:07
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#2273:
URL: https://github.com/apache/hadoop/pull/2273#discussion_r484767016



##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
##########
@@ -764,82 +759,122 @@ public int getBatchSize() {
         S3ListRequest request) throws IOException {
       this.listPath = listPath;
       this.maxKeys = listingOperationCallbacks.getMaxKeys();
+      this.request = request;
+      // initiate the asynchronous listing
       this.s3ListResultFuture = listingOperationCallbacks
               .listObjectsAsync(request);
-      this.request = request;
-      this.objectsPrev = null;
     }
 
     /**
-     * Declare that the iterator has data if it is either is the initial
-     * iteration or it is a later one and the last listing obtained was
-     * incomplete.
-     * @throws IOException never: there is no IO in this operation.
+     * Determine if there is more data.
+     * <p></p>
+     * If the previous fetch
+     * completed and has not had its data returned by a next()
+     * call, then return true.
+     *
+     * Otherwise, wait for any outsanding listing to return, blocking for
+     * the result, and kick off the next async fetch if that result
+     * indicates there is more data.
+     * <p></p>
+     * @throws IOException if the retrieval process failed.
      */
     @Override
     public boolean hasNext() throws IOException {
-      return firstListing ||
-              (objectsPrev != null && objectsPrev.isTruncated());
+      // there is a next element if either there is a result already
+      // or there is a result to block for
+      try {
+        if (objects != null) {
+          // already a result, which next() will return.
+          return true;
+        }
+        // no previous objects
+
+        // Is there is any active fetch underway?
+        if (s3ListResultFuture == null) {
+          // no outstanding list, so no more results.
+          return false;
+        }
+        // there is a future, so wait for it.
+        objects = awaitListOperationCompletion();
+        listingCount++;
+        // kick off the next listing
+        fetchNextBatchAsyncIfPresent();
+
+        // if we get here, there was a result, even if it is an empty listing.
+        return true;
+      } catch (AmazonClientException e) {
+        throw translateException("listObjects()", listPath, e);
+      }
+    }
+
+    /**
+     * Wait for the listing to complete; sets {@link #s3ListResultFuture}
+     * to null before returning.
+     * @return the result of the active operation.
+     * @throws IOException failure.
+     */
+    protected S3ListResult awaitListOperationCompletion()
+        throws IOException {
+      Preconditions.checkState(s3ListResultFuture != null,
+          "No active listing in progress");
+      S3ListResult result = awaitFuture(s3ListResultFuture);

Review comment:
       If I add a once() here I can cut the catch/translate elsewhere




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 479951)
    Time Spent: 1h  (was: 50m)

> S3A (async) ObjectListingIterator to block in hasNext() for results
> -------------------------------------------------------------------
>
>                 Key: HADOOP-17242
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17242
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> HADOOP-17074 made listing async in S3A, but the iterator's hasNext Call 
> doesn't wait for the result. If invoked on an empty path it *may* return when 
> it should be failing.
> Note: surfaced in code review, not seen in the wild and all our tests were 
> happy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to