>From Murtadha Hubail <[email protected]>: Murtadha Hubail has submitted this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315?usp=email )
Change subject: [NO ISSUE][OTH] Ignore directory objects returned for listing objects from S3 ...................................................................... [NO ISSUE][OTH] Ignore directory objects returned for listing objects from S3 - user model changes: no - storage format changes: no - interface changes: no Details: Some S3-storage implementations return directory objects in addition to the real objects when listing objects from a bucket. This causes a failure since we don’t expect such objects to exist. Directory objects are the ones created with a training slash / and 0-byte size. We should filter out the objects ending with / since we don’t create such objects. Ext-ref: MB-72272 Change-Id: I1a34f22e250403f3b93e7fe1291bd50b97c00ed8 Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315 Integration-Tests: Jenkins <[email protected]> Tested-by: Murtadha Hubail <[email protected]> Reviewed-by: Murtadha Hubail <[email protected]> Reviewed-by: Ali Alsuliman <[email protected]> --- M asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java M asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java M asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java 3 files changed, 15 insertions(+), 3 deletions(-) Approvals: Ali Alsuliman: Looks good to me, but someone else must approve Anon. E. Moose #1000171: Murtadha Hubail: Looks good to me, approved; Verified Jenkins: Verified diff --git a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java index 1855110..3aefc37 100644 --- a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java +++ b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientUtils.java @@ -25,6 +25,9 @@ import java.nio.charset.Charset; import java.util.ArrayList; import java.util.List; +import java.util.function.Predicate; + +import org.apache.asterix.external.util.aws.s3.S3Utils; import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.services.s3.model.ListObjectsV2Request; @@ -51,7 +54,9 @@ listObjectsResponse = s3Client.listObjectsV2(listObjectsBuilder.continuationToken(newMarker).build()); } - files.addAll(listObjectsResponse.contents()); + // ignore objects ending with "/" since we don't create such objects. + // S3 can return folders as objects with size 0 and key ending with "/" + listObjectsResponse.contents().stream().filter(Predicate.not(S3Utils::isDirectory)).forEach(files::add); // Mark the flag as done if done, otherwise, get the marker of the previous response for the next request if (Boolean.FALSE.equals(listObjectsResponse.isTruncated())) { diff --git a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java index 39b7cb2..c033485 100644 --- a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java +++ b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ParallelDownloader.java @@ -156,6 +156,7 @@ DownloadDirectoryRequest.Builder builder = DownloadDirectoryRequest.builder(); builder.bucket(bucket); builder.destination(fileReference.getFile().toPath()); + // check behavior regarding directory objects builder.listObjectsV2RequestTransformer( l -> l.prefix(config.getPrefix() + fileReference.getRelativePath())); DirectoryDownload directoryDownload = transferManager.downloadDirectory(builder.build()); diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java index db91300..0b00b08 100644 --- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java +++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java @@ -152,6 +152,8 @@ static final String CHECKSUM_BEHAVIOR_ALLOWED_VALUES = Arrays.stream(S3ChecksumBehavior.values()) .map(v -> v.name().toLowerCase()).collect(Collectors.joining(", ")); + static final String SLASH = "/"; + private static final class StaticTrustManagersProvider implements TlsTrustManagersProvider { private final TrustManager[] trustManagers; @@ -734,7 +736,7 @@ ListObjectsV2Request.Builder listObjectsBuilder = ListObjectsV2Request.builder(); listObjectsBuilder.bucket(container); listObjectsBuilder.prefix(prefix); - listObjectsBuilder.delimiter("/"); + listObjectsBuilder.delimiter(SLASH); ListObjectsV2Request listObjectsV2Request = listObjectsBuilder.build(); Map<String, List<String>> allObjects = new HashMap<>(); @@ -762,7 +764,7 @@ String folderName = object.prefix(); folderName = folderName.substring(prefix.length()); folders.add( - folderName.endsWith("/") ? folderName.substring(0, folderName.length() - 1) : folderName); + folderName.endsWith(SLASH) ? folderName.substring(0, folderName.length() - 1) : folderName); } } } finally { @@ -881,4 +883,8 @@ } configuration.put(CHANGE_DETECTION_MODE_FIELD_NAME, changeDetectionMode.toLowerCase()); } + + public static boolean isDirectory(S3Object s3Object) { + return s3Object.key().endsWith(SLASH); + } } -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21315?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: asterixdb Gerrit-Branch: lumina Gerrit-Change-Id: I1a34f22e250403f3b93e7fe1291bd50b97c00ed8 Gerrit-Change-Number: 21315 Gerrit-PatchSet: 4 Gerrit-Owner: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Hussain Towaileb <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]>
