[jira] [Commented] (DRILL-6504) Corrections to S3 storage doc pages

Bridget Bevens (JIRA) Mon, 18 Jun 2018 19:11:48 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516555#comment-16516555
 ]


Bridget Bevens commented on DRILL-6504:
---------------------------------------

Hi Paul,
Thanks for the corrections and info!
I've updated the doc and pushed [it to my 
repo|https://github.com/bbevens/drill/blob/0034a410a4774f3abb5962693c15264d7356b221/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md].
 Can you please have a look and let me know if the changes I made correctly 
reflect what you described in this JIRA?

Thanks!
Bridget

> Corrections to S3 storage doc pages
> -----------------------------------
>
>                 Key: DRILL-6504
>                 URL: https://issues.apache.org/jira/browse/DRILL-6504
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Bridget Bevens
>            Priority: Major
>              Labels: doc-impacting
>
> [The documentation for S3 
> storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of 
> minor errors.
> "using the S3a library."
> Change to "using the HDFS s3a library." (The library is provided via HDFS, 
> not Drill.)
> ----
> "Drill's previous S3n interface"
> Change to "the older HDFS s3n library." (Again, S3 support is provided by 
> HDFS.)
> ----
> "Starting with version 1.3.0"
> Can probably be removed, 1.3 was quite a long time ago.
> ----
> "To enable Drill's S3a support"
> Change to "To enable HDFS s3a support"
> ----
> Include a link to the HDFS S3 documentation: 
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]
> ----
> Refer to the S3a documentation link above. There are actually multiple ways 
> to configure S3a:
> * In the storage plugin config (as is suggested by the shipped s3 example in 
> the Drill storage page.)
> * Using core-site.xml}} as described in the docs.
> * Using environment variables set before running Drill or in {{drill-env.sh}}.
> * Maybe using the {{~/.aws/credentials}} directory? Have not tested this one.
> ----
> Since Drill does not use HDFS 3.x, Drill dues not support AWS temporary 
> credentials as described in the S3a documentation.
> ----
> "edit the file conf/core-site.xml in your Drill install directory,"
> Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename 
> core-site-example.xml to core-site.xml and ..."
> Note: once the file is renamed, it the user had $HADOOP_HOME on their path, 
> Hadoop support will break because Drill will pull in the Drill version of 
> core-site.xml rather than the Hadoop one. This will cause tools such as 
> Drill-on-YARN to fail.
> In this situation, the user should make the changes in Hadoop's core-site.xml 
> and should not create one for Drill. (In fact, if the user is using Hadoop 
> and want to use S3 with Drill, they probably already had S3 support 
> configured...)
> In Drill 1.13 (not sure when it was added), the default "s3" storage plugin 
> lets the user define the access keys as storage plugin configuration 
> properties:
> {code}
>   "config": {
>     "fs.s3a.access.key": "ID",
>     "fs.s3a.secret.key": "SECRET"
>   },
> {code}
> This approach is not very secure, but is probably OK when Drill has a single 
> user (such as on a laptop.)
> ----
> When using the above approach, it appears that one must specify the endpoint:
> {code}
>   "connection": "s3a://<bucket-name>/",
>   "config": {
>     "fs.s3a.access.key": "<key>",
>     "fs.s3a.secret.key": "<key>",
>     "fs.s3a.endpoint": "s3.us-west-1.amazonaws.com"
>   },
> {code}
> I could not get the above to work using the pattern in the default S3 config:
> {code}
>      connection: "s3a://my.bucket.location.com",
> {code}
> Using the endpoint is how all S3a examples I could find described the usage.
> ----
> A workable, semi-secure combination is:
> * Use the S3 storage plugin config to specify only the bucket.
> {code}
>   "connection": "s3a://mybucket/",
>   "config": {
>   },
> {code}
> * Specify the credentials and endpoint in the {{core-site.xml}} file:
> {code}
> <configuration>
>     <property>
>         <name>fs.s3a.access.key</name>
>         <value>ACCESS-KEY</value>
>     </property>
>     <property>
>         <name>fs.s3a.secret.key</name>
>         <value>SECRET-KEY</value>
>     </property>
>     <property>
>         <name>fs.s3a.endpoint</name>
>         <value>s3.REGION.amazonaws.com</value>
>     </property>
> </configuration>
> {code}
> ----
> "Point your browser to http://:8047";
> Change to "http://<drill-host>:8047, where <drill-host> is a node on which 
> Drill is running."
> "Note: on a single machine system, you'll need to run drill-embedded before 
> you can access the web console site"
> The general rule is that Drill must be running, whether embedded, in 
> server-mode on the local host, or in a cluster.
> ----
> "Duplicate the 'dfs' plugin."
> This is not necessary. If Drill is local (single server) then it is helpful 
> to allow both local and S3 access. But, if Drill is deployed in a cluster, 
> local file access is problematic. In short, make this section closer to the 
> [HDFS storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.
> Note also that in Drill 1.13, Drill ships with an "s3" storage configuration; 
> the user need only enable it. No need to copy/paste the dfs plugin.
> ----
> "you can set this parameter in conf/core-site.xml file in your Drill install 
> directory"
> Based on the comments above, change this to: "you can set this parameter in 
> core-site.xml"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6504) Corrections to S3 storage doc pages

Reply via email to