>From Murtadha Hubail <[email protected]>:

Murtadha Hubail has submitted this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315?usp=email )

Change subject: [NO ISSUE][OTH] Ignore directory objects returned for listing 
objects from S3
......................................................................

[NO ISSUE][OTH] Ignore directory objects returned for listing objects from S3

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
Some S3-storage implementations return directory objects
in addition to the real objects when listing objects from a bucket.
This causes a failure since we don’t expect such objects to exist.
Directory objects are the ones created with a training
slash / and 0-byte size. We should filter out the objects
ending with / since we don’t create such objects.

Ext-ref: MB-72272
Change-Id: I1a34f22e250403f3b93e7fe1291bd50b97c00ed8
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315
Integration-Tests: Jenkins <[email protected]>
Tested-by: Murtadha Hubail <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
---
M 
asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java
M 
asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java
M 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
3 files changed, 15 insertions(+), 3 deletions(-)

Approvals:
  Ali Alsuliman: Looks good to me, but someone else must approve
  Anon. E. Moose #1000171:
  Murtadha Hubail: Looks good to me, approved; Verified
  Jenkins: Verified




diff --git 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java
 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java
index 1855110..3aefc37 100644
--- 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java
+++ 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java
@@ -25,6 +25,9 @@
 import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.List;
+import java.util.function.Predicate;
+
+import org.apache.asterix.external.util.aws.s3.S3Utils;

 import software.amazon.awssdk.services.s3.S3Client;
 import software.amazon.awssdk.services.s3.model.ListObjectsV2Request;
@@ -51,7 +54,9 @@
                 listObjectsResponse = 
s3Client.listObjectsV2(listObjectsBuilder.continuationToken(newMarker).build());
             }

-            files.addAll(listObjectsResponse.contents());
+            // ignore objects ending with "/" since we don't create such 
objects.
+            // S3 can return folders as objects with size 0 and key ending 
with "/"
+            
listObjectsResponse.contents().stream().filter(Predicate.not(S3Utils::isDirectory)).forEach(files::add);

             // Mark the flag as done if done, otherwise, get the marker of the 
previous response for the next request
             if (Boolean.FALSE.equals(listObjectsResponse.isTruncated())) {
diff --git 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java
 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java
index 39b7cb2..c033485 100644
--- 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java
+++ 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java
@@ -156,6 +156,7 @@
             DownloadDirectoryRequest.Builder builder = 
DownloadDirectoryRequest.builder();
             builder.bucket(bucket);
             builder.destination(fileReference.getFile().toPath());
+            // check behavior regarding directory objects
             builder.listObjectsV2RequestTransformer(
                     l -> l.prefix(config.getPrefix() + 
fileReference.getRelativePath()));
             DirectoryDownload directoryDownload = 
transferManager.downloadDirectory(builder.build());
diff --git 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
index db91300..0b00b08 100644
--- 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
+++ 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
@@ -152,6 +152,8 @@
     static final String CHECKSUM_BEHAVIOR_ALLOWED_VALUES = 
Arrays.stream(S3ChecksumBehavior.values())
             .map(v -> v.name().toLowerCase()).collect(Collectors.joining(", 
"));

+    static final String SLASH = "/";
+
     private static final class StaticTrustManagersProvider implements 
TlsTrustManagersProvider {
         private final TrustManager[] trustManagers;

@@ -734,7 +736,7 @@
         ListObjectsV2Request.Builder listObjectsBuilder = 
ListObjectsV2Request.builder();
         listObjectsBuilder.bucket(container);
         listObjectsBuilder.prefix(prefix);
-        listObjectsBuilder.delimiter("/");
+        listObjectsBuilder.delimiter(SLASH);
         ListObjectsV2Request listObjectsV2Request = listObjectsBuilder.build();

         Map<String, List<String>> allObjects = new HashMap<>();
@@ -762,7 +764,7 @@
                     String folderName = object.prefix();
                     folderName = folderName.substring(prefix.length());
                     folders.add(
-                            folderName.endsWith("/") ? folderName.substring(0, 
folderName.length() - 1) : folderName);
+                            folderName.endsWith(SLASH) ? 
folderName.substring(0, folderName.length() - 1) : folderName);
                 }
             }
         } finally {
@@ -881,4 +883,8 @@
         }
         configuration.put(CHANGE_DETECTION_MODE_FIELD_NAME, 
changeDetectionMode.toLowerCase());
     }
+
+    public static boolean isDirectory(S3Object s3Object) {
+        return s3Object.key().endsWith(SLASH);
+    }
 }

--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: I1a34f22e250403f3b93e7fe1291bd50b97c00ed8
Gerrit-Change-Number: 21315
Gerrit-PatchSet: 4
Gerrit-Owner: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Hussain Towaileb <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Murtadha Hubail <[email protected]>

Reply via email to