[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

Aaron Fabbri (JIRA) Fri, 06 Jan 2017 12:05:17 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805590#comment-15805590
 ]


Aaron Fabbri commented on HADOOP-13336:
---------------------------------------

+1 (non-binding) once you remove the change of debug log level at end of patch.

I like this approach.  I had thought about this a bit before you started work 
and was thinking of having a "flatten" (a.k.a. propagate) function as you do 
here.  Seems more efficient at runtime and easier to debug.  I like the use of 
the Configuration source feature too.

It looks like your omission of <prefix>.impl keys still works with 
fs.s3a.metadatastore.impl, since {{"impl".equals("metadatastore.impl") == 
false}}.  (You compare the whole key, not just the suffix.)  This is good 
because we'd like to be able to enable/disable s3guard on a per-bucket basis.

Other patch comments:

{code}
+  public static Configuration propagateBucketOptions(Configuration source,
+      String bucket) {
+
<snip>
+    for (Map.Entry<String, String> entry : source) {
+      final String key = entry.getKey();
+      // get the (unexpanded) value.
+      final String value = entry.getValue();
+      if (!key.startsWith(bucketPrefix)
+          || bucketPrefix.equals(key)
+          || value == null) {
+        continue;
+      }
{code}

Was curious about the {{value == null}} part.. Does that ever happen?  Anyhow, 
seems safe to include the check.

Minor nit in the docs:

{code}
+Different S3 buckets can be accessed with S3A client configurations.
{code}

Should this read "can be accessed with different S3A client configurations"?

{code}
--- a/hadoop-tools/hadoop-aws/src/test/resources/log4j.properties
+++ b/hadoop-tools/hadoop-aws/src/test/resources/log4j.properties
@@ -21,7 +21,7 @@ log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
  
 # for debugging low level S3a operations, uncomment these lines
 # Log all S3A classes
-#log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
+log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
  
 # Log S3Guard classes
 #log4j.logger.org.apache.hadoop.fs.s3a.s3guard=DEBUG
{code}
Was this intentional?

> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13336-HADOOP-13345-001.patch, 
> HADOOP-13336-HADOOP-13345-002.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

Reply via email to