martin-g commented on code in PR #2803:
URL: https://github.com/apache/datafusion-comet/pull/2803#discussion_r2544550415
##########
native/core/src/parquet/objectstore/s3.rs:
##########
@@ -1852,14 +1869,95 @@ mod tests {
);
}
+ #[test]
+ fn test_extract_s3_config_path_style_access() {
+ // Case 1: path.style.access is true -> VirtualHostedStyleRequest
should be false
+ let mut configs = HashMap::new();
+ configs.insert("fs.s3a.path.style.access".to_string(),
"true".to_string());
Review Comment:
```suggestion
configs.insert("fs.s3a.path.style.access".to_string(),
"tRue".to_string());
```
to test `path_style_access = path_style.to_lowercase() == "true";` (line 174)
Or maybe add a separate test for this.
##########
native/core/src/parquet/objectstore/s3.rs:
##########
@@ -169,19 +169,19 @@ fn extract_s3_config_options(
}
// Extract and handle path style access (virtual hosted style)
Review Comment:
```suggestion
// Set 'virtual hosted style' config depending on the value of 'path
style access'
```
I am not sure that it is more clear
##########
native/core/src/parquet/objectstore/s3.rs:
##########
@@ -221,14 +221,31 @@ fn normalize_endpoint(
endpoint.to_string()
};
- if virtual_hosted_style_request {
+ if force_path_style {
+ // Path-style: bucket in path
+ // Example: https://s3.oss-me-central-1.aliyuncs.com/my-bucket/key
if endpoint.ends_with("/") {
Some(format!("{endpoint}{bucket}"))
} else {
Some(format!("{endpoint}/{bucket}"))
}
} else {
- Some(endpoint) // Avoid extra to_string() call since endpoint is
already a String
+ // Virtual-hosted-style: bucket in subdomain
+ // For custom endpoints (non-AWS), we need to manually insert the
bucket as a subdomain
+ // because object_store crate only does this automatically for AWS S3
endpoints.
+ // Example: https://my-bucket.s3.oss-me-central-1.aliyuncs.com/key
+
+ // Parse the endpoint to insert bucket as subdomain
+ if let Some(protocol_end) = endpoint.find("://") {
+ let protocol = &endpoint[..protocol_end + 3]; // "https://" or
"http://"
+ let rest = &endpoint[protocol_end + 3..]; //
"s3.oss-me-central-1.aliyuncs.com" or "s3.oss-me-central-1.aliyuncs.com/path"
+
+ // Insert bucket as first subdomain
+ Some(format!("{protocol}{bucket}.{rest}"))
Review Comment:
According to
https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html#general-purpose-bucket-names
```
Bucket names can consist only of lowercase letters, numbers, periods (.),
and hyphens (-).
```
usage of periods (.) in the bucket name would create several subdomains. Is
that supported ? Or a check should be added and reported somehow ?!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]