jackye1995 commented on a change in pull request #4334:
URL: https://github.com/apache/iceberg/pull/4334#discussion_r829202045



##########
File path: aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java
##########
@@ -444,4 +458,8 @@ public boolean isS3ChecksumEnabled() {
   public void setS3ChecksumEnabled(boolean eTagCheckEnabled) {
     this.isS3ChecksumEnabled = eTagCheckEnabled;
   }
+
+  public Map<String, String> getS3AccessPoints() {

Review comment:
       nit: prefer getter name without `get`, so `Map<String, String> 
s3AccessPoints()`

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java
##########
@@ -68,6 +68,7 @@
   private MetricsContext metrics = MetricsContext.nullMetrics();
   private final AtomicBoolean isResourceClosed = new AtomicBoolean(false);
   private Set<Tag> writeTags = Sets.newHashSet();
+  private Map<String, String> bucketToAccessPointMapping = Maps.newHashMap();

Review comment:
       yes good point, let's keep as is

##########
File path: aws/src/integration/java/org/apache/iceberg/aws/AwsIntegTestUtil.java
##########
@@ -47,6 +60,24 @@ public static String testBucketName() {
     return System.getenv("AWS_TEST_BUCKET");
   }
 
+  /**
+   * Get the environment variable AWS_TEST_ACCESS_POINT for a default bucket 
to use for testing
+   * @return bucket name
+   */
+  public static String testAccessPointName() {

Review comment:
       I think this part is not fully updated since the last comment? We don't 
need to get this info from environment variable, because we can just define any 
name we like in the test.

##########
File path: aws/src/integration/java/org/apache/iceberg/aws/AwsIntegTestUtil.java
##########
@@ -47,6 +60,24 @@ public static String testBucketName() {
     return System.getenv("AWS_TEST_BUCKET");
   }
 
+  /**
+   * Get the environment variable AWS_TEST_ACCESS_POINT for a default bucket 
to use for testing
+   * @return bucket name
+   */
+  public static String testAccessPointName() {
+    return System.getenv("AWS_TEST_ACCESS_POINT");
+  }
+
+  /**
+   * Get AccessPointARN for a default bucket to use for testing
+   * @return access point arn
+   */
+  public static String testAccessPointARN() {

Review comment:
       if we remove `testAccessPointName`, it means this can also be removed, 
or moved into `TestS3FileIOIntegration`

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java
##########
@@ -50,7 +52,24 @@
    * @param location fully qualified URI
    */
   S3URI(String location) {
+    this(location, ImmutableMap.of());
+  }
+
+  /**
+   * Creates a new S3URI in the form of 
scheme://(bucket|accessPoint)/key?query#fragment with additional information
+   * on accessPoints.
+   * <p>
+   * The URI supports any valid URI schemes to be backwards compatible with 
s3a and s3n,
+   * and also allows users to use S3FileIO with other S3-compatible object 
storage services like GCS.
+   * If the accessPoints contains a mapping of the given bucket used in 
location then the corresponding accessPoint

Review comment:
       nit: this sentence is a bit convoluted, what about "if the bucket of the 
location has an access point in the mapping, the access point is used to 
perform all the S3 operations"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java
##########
@@ -50,7 +52,24 @@
    * @param location fully qualified URI
    */
   S3URI(String location) {
+    this(location, ImmutableMap.of());
+  }
+
+  /**
+   * Creates a new S3URI in the form of 
scheme://(bucket|accessPoint)/key?query#fragment with additional information
+   * on accessPoints.
+   * <p>
+   * The URI supports any valid URI schemes to be backwards compatible with 
s3a and s3n,
+   * and also allows users to use S3FileIO with other S3-compatible object 
storage services like GCS.
+   * If the accessPoints contains a mapping of the given bucket used in 
location then the corresponding accessPoint

Review comment:
       nit: this sentence is a bit convoluted, what about "if the bucket in the 
location has an alias, the alias is used to perform all the S3 operations"

##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java
##########
@@ -50,7 +52,24 @@
    * @param location fully qualified URI
    */
   S3URI(String location) {
+    this(location, ImmutableMap.of());
+  }
+
+  /**
+   * Creates a new S3URI in the form of 
scheme://(bucket|accessPoint)/key?query#fragment with additional information
+   * on accessPoints.
+   * <p>
+   * The URI supports any valid URI schemes to be backwards compatible with 
s3a and s3n,
+   * and also allows users to use S3FileIO with other S3-compatible object 
storage services like GCS.
+   * If the accessPoints contains a mapping of the given bucket used in 
location then the corresponding accessPoint

Review comment:
       nit: this sentence is a bit convoluted, what about "if the bucket in the 
location has an access point in the mapping, the access point is used to 
perform all the S3 operations"

##########
File path: 
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
##########
@@ -268,4 +306,11 @@ private void validateRead(S3FileIO s3FileIO) throws 
Exception {
     stream.close();
     Assert.assertEquals(content, result);
   }
+
+  private String testAccessPointARN() {

Review comment:
       I think we also need to test cross-region endpoint, because I see this 
S3 client config:
   
   ```
       /**
        * Returns whether the client is allowed to make cross-region calls when 
an S3 Access Point ARN has a different
        * region to the one configured on the client.
        * <p>
        * @return True if a different region in the ARN can be used.
        */
       public boolean useArnRegionEnabled() {
           return useArnRegionEnabled.value();
       }
   ```
   
   Maybe we should turn this on when access point is configured. Could you 
check?

##########
File path: 
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
##########
@@ -268,4 +306,11 @@ private void validateRead(S3FileIO s3FileIO) throws 
Exception {
     stream.close();
     Assert.assertEquals(content, result);
   }
+
+  private String testAccessPointARN() {

Review comment:
       cool, so to summarize, same-single-region and multi-region don't need 
this flag, and cross-single-region need this flag, is that right?
   
   If so, I think we should introduce this flag, because there is some 
performance implications in enabling this feature as the signing region is 
changed. We should not enable this feature by default for people who use MRAP.

##########
File path: 
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
##########
@@ -268,4 +306,11 @@ private void validateRead(S3FileIO s3FileIO) throws 
Exception {
     stream.close();
     Assert.assertEquals(content, result);
   }
+
+  private String testAccessPointARN() {

Review comment:
       cool, so to summarize, same-single-region and multi-region don't need 
this flag, and cross-single-region need this flag, is that right?
   
   If so, I think we should introduce this flag, because there is some 
performance implications in enabling this feature as the signing region is 
changed on the fly. We should not enable this feature by default for people who 
use MRAP.

##########
File path: 
aws/src/integration/java/org/apache/iceberg/aws/s3/TestS3FileIOIntegration.java
##########
@@ -268,4 +306,11 @@ private void validateRead(S3FileIO s3FileIO) throws 
Exception {
     stream.close();
     Assert.assertEquals(content, result);
   }
+
+  private String testAccessPointARN() {

Review comment:
       For boolean config, prefer to use `-enabled` suffix as convention. All 
S3 features should be under `s3.` config namespace.
   
   I think we can just follow the S3 variable name, `s3.arn-region-enabled` 
since S3 already gave this feature this name. We can directly link to this 
feature's github link in AWS SDK v2 repo as a reference. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to