[
https://issues.apache.org/jira/browse/DRILL-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516555#comment-16516555
]
Bridget Bevens commented on DRILL-6504:
---------------------------------------
Hi Paul,
Thanks for the corrections and info!
I've updated the doc and pushed [it to my
repo|https://github.com/bbevens/drill/blob/0034a410a4774f3abb5962693c15264d7356b221/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md].
Can you please have a look and let me know if the changes I made correctly
reflect what you described in this JIRA?
Thanks!
Bridget
> Corrections to S3 storage doc pages
> -----------------------------------
>
> Key: DRILL-6504
> URL: https://issues.apache.org/jira/browse/DRILL-6504
> Project: Apache Drill
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Bridget Bevens
> Priority: Major
> Labels: doc-impacting
>
> [The documentation for S3
> storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of
> minor errors.
> "using the S3a library."
> Change to "using the HDFS s3a library." (The library is provided via HDFS,
> not Drill.)
> ----
> "Drill's previous S3n interface"
> Change to "the older HDFS s3n library." (Again, S3 support is provided by
> HDFS.)
> ----
> "Starting with version 1.3.0"
> Can probably be removed, 1.3 was quite a long time ago.
> ----
> "To enable Drill's S3a support"
> Change to "To enable HDFS s3a support"
> ----
> Include a link to the HDFS S3 documentation:
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]
> ----
> Refer to the S3a documentation link above. There are actually multiple ways
> to configure S3a:
> * In the storage plugin config (as is suggested by the shipped s3 example in
> the Drill storage page.)
> * Using core-site.xml}} as described in the docs.
> * Using environment variables set before running Drill or in {{drill-env.sh}}.
> * Maybe using the {{~/.aws/credentials}} directory? Have not tested this one.
> ----
> Since Drill does not use HDFS 3.x, Drill dues not support AWS temporary
> credentials as described in the S3a documentation.
> ----
> "edit the file conf/core-site.xml in your Drill install directory,"
> Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename
> core-site-example.xml to core-site.xml and ..."
> Note: once the file is renamed, it the user had $HADOOP_HOME on their path,
> Hadoop support will break because Drill will pull in the Drill version of
> core-site.xml rather than the Hadoop one. This will cause tools such as
> Drill-on-YARN to fail.
> In this situation, the user should make the changes in Hadoop's core-site.xml
> and should not create one for Drill. (In fact, if the user is using Hadoop
> and want to use S3 with Drill, they probably already had S3 support
> configured...)
> In Drill 1.13 (not sure when it was added), the default "s3" storage plugin
> lets the user define the access keys as storage plugin configuration
> properties:
> {code}
> "config": {
> "fs.s3a.access.key": "ID",
> "fs.s3a.secret.key": "SECRET"
> },
> {code}
> This approach is not very secure, but is probably OK when Drill has a single
> user (such as on a laptop.)
> ----
> When using the above approach, it appears that one must specify the endpoint:
> {code}
> "connection": "s3a://<bucket-name>/",
> "config": {
> "fs.s3a.access.key": "<key>",
> "fs.s3a.secret.key": "<key>",
> "fs.s3a.endpoint": "s3.us-west-1.amazonaws.com"
> },
> {code}
> I could not get the above to work using the pattern in the default S3 config:
> {code}
> connection: "s3a://my.bucket.location.com",
> {code}
> Using the endpoint is how all S3a examples I could find described the usage.
> ----
> A workable, semi-secure combination is:
> * Use the S3 storage plugin config to specify only the bucket.
> {code}
> "connection": "s3a://mybucket/",
> "config": {
> },
> {code}
> * Specify the credentials and endpoint in the {{core-site.xml}} file:
> {code}
> <configuration>
> <property>
> <name>fs.s3a.access.key</name>
> <value>ACCESS-KEY</value>
> </property>
> <property>
> <name>fs.s3a.secret.key</name>
> <value>SECRET-KEY</value>
> </property>
> <property>
> <name>fs.s3a.endpoint</name>
> <value>s3.REGION.amazonaws.com</value>
> </property>
> </configuration>
> {code}
> ----
> "Point your browser to http://:8047"
> Change to "http://<drill-host>:8047, where <drill-host> is a node on which
> Drill is running."
> "Note: on a single machine system, you'll need to run drill-embedded before
> you can access the web console site"
> The general rule is that Drill must be running, whether embedded, in
> server-mode on the local host, or in a cluster.
> ----
> "Duplicate the 'dfs' plugin."
> This is not necessary. If Drill is local (single server) then it is helpful
> to allow both local and S3 access. But, if Drill is deployed in a cluster,
> local file access is problematic. In short, make this section closer to the
> [HDFS storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.
> Note also that in Drill 1.13, Drill ships with an "s3" storage configuration;
> the user need only enable it. No need to copy/paste the dfs plugin.
> ----
> "you can set this parameter in conf/core-site.xml file in your Drill install
> directory"
> Based on the comments above, change this to: "you can set this parameter in
> core-site.xml"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)