>From Ritik Raj <[email protected]>:

Ritik Raj has submitted this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21213?usp=email )

Change subject: [ASTERIXDB-3768][CLOUD] Add S3 checksumBehavior parameter for 
external link
......................................................................

[ASTERIXDB-3768][CLOUD] Add S3 checksumBehavior parameter for external link

- user model changes: yes
- storage format changes: no
- interface changes: no

Ext-ref: MB-71732
Change-Id: Id8fd6405c4aec2a79f469259339d4584612e2479
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21213
Tested-by: Ritik Raj <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Reviewed-by: Ritik Raj <[email protected]>
---
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.000.ddl.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.099.ddl.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.000.ddl.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.099.ddl.sqlpp
M 
asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
M 
asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
M 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Constants.java
M 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
8 files changed, 179 insertions(+), 12 deletions(-)

Approvals:
  Ritik Raj: Looks good to me, but someone else must approve; Verified
  Jenkins: Verified
  Ali Alsuliman: Looks good to me, approved




diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.000.ddl.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.000.ddl.sqlpp
new file mode 100644
index 0000000..b078f00
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.000.ddl.sqlpp
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+drop dataverse test if exists;
+create dataverse test;
+use test;
+
+drop type test if exists;
+create type test as open {
+};
+
+drop dataset test if exists;
+create external dataset test(test) using S3 (
+    ("accessKeyId"="dummyAccessKey"),
+    ("secretAccessKey"="dummySecretKey"),
+    ("region"="us-west-1"),
+    ("checksumBehavior"="invalid-checksum-value"),
+    ("serviceEndpoint"="http://127.0.0.1:8001";),
+    ("container"="playground"),
+    ("definition"="json-data/reviews/single-line/json"),
+    ("format"="json")
+);
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.099.ddl.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.099.ddl.sqlpp
new file mode 100644
index 0000000..dc10acd
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/invalid-value/external_dataset.099.ddl.sqlpp
@@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+drop dataverse test if exists;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.000.ddl.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.000.ddl.sqlpp
new file mode 100644
index 0000000..5629b16
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.000.ddl.sqlpp
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+drop dataverse test if exists;
+create dataverse test;
+use test;
+
+drop type test if exists;
+create type test as open {
+};
+
+drop dataset test if exists;
+create external dataset test(test) using S3 (
+    ("accessKeyId"="dummyAccessKey"),
+    ("secretAccessKey"="dummySecretKey"),
+    ("region"="us-west-1"),
+    ("checksumBehavior"="when_required"),
+    ("serviceEndpoint"="http://127.0.0.1:8001";),
+    ("container"="playground"),
+    ("definition"="json-data/reviews/single-line/json"),
+    ("format"="json")
+);
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.099.ddl.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.099.ddl.sqlpp
new file mode 100644
index 0000000..dc10acd
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/external-dataset/s3/checksum-behavior/valid-value/external_dataset.099.ddl.sqlpp
@@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+drop dataverse test if exists;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
 
b/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
index 355cfd7..2e4987a 100644
--- 
a/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/testsuite_external_dataset_s3.xml
@@ -1290,6 +1290,17 @@
       </compilation-unit>
     </test-case>
     <test-case FilePath="external-dataset/s3">
+      <compilation-unit name="checksum-behavior/valid-value">
+        <output-dir compare="Text">checksum-behavior/valid-value</output-dir>
+      </compilation-unit>
+    </test-case>
+    <test-case FilePath="external-dataset/s3">
+      <compilation-unit name="checksum-behavior/invalid-value">
+        <output-dir compare="Text">checksum-behavior/invalid-value</output-dir>
+        <expected-error>ASX1173: Invalid value for parameter 
'checksumBehavior', allowed value(s): when_required, when_supported, 
auto</expected-error>
+      </compilation-unit>
+    </test-case>
+    <test-case FilePath="external-dataset/s3">
       <compilation-unit name="anonymous_no_auth">
         <output-dir compare="Text">anonymous_no_auth</output-dir>
         <expected-error>ASX3119: Parameter 'secretAccessKey' is required if 
'accessKeyId' is provided</expected-error>
diff --git 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
index 94b6f5a..73237ce 100644
--- 
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
+++ 
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
@@ -134,7 +134,7 @@
         // Disabled
         long profilerLogInterval = 0;

-        // Dummy values;
+        // Dummy values
         String region = "";
         String prefix = "";
         Collection<String> certificates = Collections.emptyList();
diff --git 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Constants.java
 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Constants.java
index 79f12be..8390576 100644
--- 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Constants.java
+++ 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Constants.java
@@ -27,6 +27,7 @@
     public static final int MAX_KEY_LENGTH_IN_BYTES = 1024;

     public static final String PATH_STYLE_ADDRESSING_FIELD_NAME = 
"pathStyleAddressing";
+    public static final String CHECKSUM_BEHAVIOR_FIELD_NAME = 
"checksumBehavior";

     /*
      * Hadoop-AWS
diff --git 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
index 579889f..81bd942 100644
--- 
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
+++ 
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
@@ -44,6 +44,7 @@
 import static 
org.apache.asterix.external.util.aws.AwsUtils.getPathStyleAddressing;
 import static org.apache.asterix.external.util.aws.AwsUtils.getRegion;
 import static 
org.apache.asterix.external.util.aws.s3.S3Constants.CHANGE_DETECTION_MODE_FIELD_NAME;
+import static 
org.apache.asterix.external.util.aws.s3.S3Constants.CHECKSUM_BEHAVIOR_FIELD_NAME;
 import static org.apache.asterix.external.util.aws.s3.S3Constants.FILES;
 import static org.apache.asterix.external.util.aws.s3.S3Constants.FOLDERS;
 import static 
org.apache.asterix.external.util.aws.s3.S3Constants.HADOOP_ACCESS_KEY_ID;
@@ -77,6 +78,8 @@
 import static 
org.apache.asterix.external.util.aws.s3.S3Constants.INPUT_STREAM_TYPE_FIELD_NAME;
 import static 
org.apache.asterix.external.util.aws.s3.S3Constants.PATH_STYLE_ADDRESSING_FIELD_NAME;
 import static org.apache.hyracks.api.util.ExceptionUtils.getMessageOrToString;
+import static 
org.apache.hyracks.util.annotations.AiProvenance.Agent.CLAUDE_SONNET_4_6;
+import static 
org.apache.hyracks.util.annotations.AiProvenance.Tool.GITHUB_COPILOT;

 import java.io.ByteArrayInputStream;
 import java.nio.charset.StandardCharsets;
@@ -85,6 +88,7 @@
 import java.security.cert.CertificateFactory;
 import java.security.cert.X509Certificate;
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.Collection;
 import java.util.HashMap;
 import java.util.List;
@@ -93,6 +97,7 @@
 import java.util.UUID;
 import java.util.function.BiPredicate;
 import java.util.regex.Matcher;
+import java.util.stream.Collectors;

 import javax.net.ssl.TrustManager;
 import javax.net.ssl.TrustManagerFactory;
@@ -115,8 +120,8 @@
 import org.apache.hyracks.api.exceptions.IWarningCollector;
 import org.apache.hyracks.api.exceptions.SourceLocation;
 import org.apache.hyracks.api.exceptions.Warning;
-import org.apache.hyracks.cloud.io.ICloudProperties;
 import org.apache.hyracks.cloud.io.S3ChecksumBehavior;
+import org.apache.hyracks.util.annotations.AiProvenance;
 import org.apache.logging.log4j.LogManager;
 import org.apache.logging.log4j.Logger;

@@ -144,6 +149,8 @@

 public class S3Utils {
     private static final Logger LOGGER = LogManager.getLogger();
+    static final String CHECKSUM_BEHAVIOR_ALLOWED_VALUES = 
Arrays.stream(S3ChecksumBehavior.values())
+            .map(v -> v.name().toLowerCase()).collect(Collectors.joining(", 
"));

     private static final class StaticTrustManagersProvider implements 
TlsTrustManagersProvider {
         private final TrustManager[] trustManagers;
@@ -195,16 +202,7 @@
         } else if (certificates != null && !certificates.isBlank()) {
             builder.httpClient(createHttpClient(certificates));
         }
-
-        S3ChecksumBehavior checksumBehavior;
-        if (appCtx != null && appCtx.getCloudProperties() != null) {
-            ICloudProperties cloudProperties = appCtx.getCloudProperties();
-            checksumBehavior = cloudProperties.getS3ChecksumBehavior();
-        } else {
-            // No cloud properties (external data context): default based on 
whether a custom endpoint is in use
-            checksumBehavior = 
S3ChecksumBehavior.defaultForEndpoint(serviceEndpoint);
-        }
-        applyChecksumBehavior(builder, checksumBehavior);
+        applyChecksumBehavior(builder, resolveChecksumBehavior(configuration, 
serviceEndpoint));
         awsClients.setConsumingClient(builder.build());
         return awsClients;
     }
@@ -786,6 +784,47 @@
         return errorCode.equals(ERROR_INTERNAL_ERROR) || 
errorCode.equals(ERROR_SLOW_DOWN);
     }

+    /**
+     * Resolves the effective checksum behavior for external data (link) 
operations using a two-level priority chain:
+     * <ol>
+     *   <li>Per-link {@code checksumBehavior} parameter in the configuration 
map (explicit override)</li>
+     *   <li>Endpoint-based default: {@link S3ChecksumBehavior#AUTO} for 
native AWS S3,
+     *       {@link S3ChecksumBehavior#WHEN_REQUIRED} for S3-compatible 
storage</li>
+     * </ol>
+     * Cloud properties are intentionally not consulted here — this method is 
for external data (link) operations
+     * only, not for blob storage.
+     */
+    @AiProvenance(agent = CLAUDE_SONNET_4_6, tool = GITHUB_COPILOT)
+    public static S3ChecksumBehavior resolveChecksumBehavior(Map<String, 
String> configuration, String serviceEndpoint)
+            throws CompilationException {
+        String perLink = configuration.get(CHECKSUM_BEHAVIOR_FIELD_NAME);
+        if (perLink != null) {
+            try {
+                return S3ChecksumBehavior.fromString(perLink);
+            } catch (IllegalArgumentException e) {
+                throw new 
CompilationException(INVALID_PARAM_VALUE_ALLOWED_VALUE, 
CHECKSUM_BEHAVIOR_FIELD_NAME,
+                        CHECKSUM_BEHAVIOR_ALLOWED_VALUES);
+            }
+        }
+        return S3ChecksumBehavior.defaultForEndpoint(serviceEndpoint);
+    }
+
+    /**
+     * Validates a {@code checksumBehavior} string value.
+     * Throws {@link CompilationException} if the value is non-null and not a 
recognized enum value.
+     */
+    public static void validateChecksumBehavior(String value) throws 
CompilationException {
+        if (value == null) {
+            return;
+        }
+        try {
+            S3ChecksumBehavior.fromString(value);
+        } catch (IllegalArgumentException e) {
+            throw new CompilationException(INVALID_PARAM_VALUE_ALLOWED_VALUE, 
CHECKSUM_BEHAVIOR_FIELD_NAME,
+                    CHECKSUM_BEHAVIOR_ALLOWED_VALUES);
+        }
+    }
+
     public static boolean validateAndGetPathStyleAddressing(String 
pathStyleAddressing, String endpoint)
             throws CompilationException {
         if (pathStyleAddressing == null) {

--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21213?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: Id8fd6405c4aec2a79f469259339d4584612e2479
Gerrit-Change-Number: 21213
Gerrit-PatchSet: 7
Gerrit-Owner: Ritik Raj <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Ritik Raj <[email protected]>

Reply via email to