gianm commented on a change in pull request #8903: S3 input source
URL: https://github.com/apache/incubator-druid/pull/8903#discussion_r349863464
 
 

 ##########
 File path: 
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3Utils.java
 ##########
 @@ -168,6 +244,26 @@ public S3ObjectSummary next()
     };
   }
 
+
+  /**
+   * Create an {@link URI} from the given {@link S3ObjectSummary}. The result 
URI is composed as below.
+   *
+   * <pre>
+   * {@code s3://{BUCKET_NAME}/{OBJECT_KEY}}
+   * </pre>
+   */
+  public static URI summaryToUri(S3ObjectSummary object)
+  {
+    final String originalAuthority = object.getBucketName();
+    final String originalPath = object.getKey();
+    final String authority = originalAuthority.endsWith("/") ?
+                             originalAuthority.substring(0, 
originalAuthority.length() - 1) :
+                             originalAuthority;
+    final String path = originalPath.startsWith("/") ? 
originalPath.substring(1) : originalPath;
+
+    return URI.create(StringUtils.format("s3://%s/%s", authority, path));
 
 Review comment:
   How about adding this method to CloudObjectLocation:
   
   ```java
   public URI toUri()
   {
     // Encode path, except leave '/' characters unencoded
     return URI.create(StringUtils.format("s3://%s/%s", bucket, 
StringUtils.urlEncode(path).replace("%2F", "/"));
   }
   ```
   
   And using it everywhere that is doing this sort of concatenation today.
   
   It won't handle weird, invalid `bucket` names but it's better than the 
simple concatenation happening now, and weird `path`s are more likely anyway. 
For extra credit you could include validation for the `bucket`, throwing an 
error if it's not valid (AWS, Google, etc all have rules for what's a valid 
bucket, you could do a loose superset of them).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to