[
https://issues.apache.org/jira/browse/DRILL-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517678#comment-16517678
]
Paul Rogers commented on DRILL-6504:
------------------------------------
Hi [~bbevens], that was a fast edit! Looking good. A few comments:
* "Providing AWS Credentials" Need a space after "###"
* In think we can make the various config options a bit clearer:
"There are three ways to configure S3:
1. Directly in the S3 storage plugin configuration. (Least secure.)
2. A Drill-specific core-site.xml file. (Good if you don't use HDFS.)
3. Using existing S3 configuration for Hadoop."
* Then, for the "Note:" you can simply say to use method 3 with an existing
Hadoop setup. That is, if the user integrates with Hadoop, don't use option 2.
* "###Configuring the S3 Storage Plugin" Need a space after the "###".
* The flow is awkward between "Configuring Access Keys in the S3 Storage
Plugin" and "Configuring the S3 Storage Plugin". The first of these refers to
the S3 storage plugin config which is introduced in the second. Maybe reverse
the two or some such?
* "1- To access" I think you need to use "1." to get Markdown to generate
number bullets.
Otherwise, the revisions look pretty good.
I wonder if Padma knows if Drill supports directory queries with S3. Any other
S3-specific limitations?
> Corrections to S3 storage doc pages
> -----------------------------------
>
> Key: DRILL-6504
> URL: https://issues.apache.org/jira/browse/DRILL-6504
> Project: Apache Drill
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Bridget Bevens
> Priority: Major
> Labels: doc-impacting
>
> [The documentation for S3
> storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of
> minor errors.
> "using the S3a library."
> Change to "using the HDFS s3a library." (The library is provided via HDFS,
> not Drill.)
> ----
> "Drill's previous S3n interface"
> Change to "the older HDFS s3n library." (Again, S3 support is provided by
> HDFS.)
> ----
> "Starting with version 1.3.0"
> Can probably be removed, 1.3 was quite a long time ago.
> ----
> "To enable Drill's S3a support"
> Change to "To enable HDFS s3a support"
> ----
> Include a link to the HDFS S3 documentation:
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]
> ----
> Refer to the S3a documentation link above. There are actually multiple ways
> to configure S3a:
> * In the storage plugin config (as is suggested by the shipped s3 example in
> the Drill storage page.)
> * Using core-site.xml}} as described in the docs.
> * Using environment variables set before running Drill or in {{drill-env.sh}}.
> * Maybe using the {{~/.aws/credentials}} directory? Have not tested this one.
> ----
> Since Drill does not use HDFS 3.x, Drill dues not support AWS temporary
> credentials as described in the S3a documentation.
> ----
> "edit the file conf/core-site.xml in your Drill install directory,"
> Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename
> core-site-example.xml to core-site.xml and ..."
> Note: once the file is renamed, it the user had $HADOOP_HOME on their path,
> Hadoop support will break because Drill will pull in the Drill version of
> core-site.xml rather than the Hadoop one. This will cause tools such as
> Drill-on-YARN to fail.
> In this situation, the user should make the changes in Hadoop's core-site.xml
> and should not create one for Drill. (In fact, if the user is using Hadoop
> and want to use S3 with Drill, they probably already had S3 support
> configured...)
> In Drill 1.13 (not sure when it was added), the default "s3" storage plugin
> lets the user define the access keys as storage plugin configuration
> properties:
> {code}
> "config": {
> "fs.s3a.access.key": "ID",
> "fs.s3a.secret.key": "SECRET"
> },
> {code}
> This approach is not very secure, but is probably OK when Drill has a single
> user (such as on a laptop.)
> ----
> When using the above approach, it appears that one must specify the endpoint:
> {code}
> "connection": "s3a://<bucket-name>/",
> "config": {
> "fs.s3a.access.key": "<key>",
> "fs.s3a.secret.key": "<key>",
> "fs.s3a.endpoint": "s3.us-west-1.amazonaws.com"
> },
> {code}
> I could not get the above to work using the pattern in the default S3 config:
> {code}
> connection: "s3a://my.bucket.location.com",
> {code}
> Using the endpoint is how all S3a examples I could find described the usage.
> ----
> A workable, semi-secure combination is:
> * Use the S3 storage plugin config to specify only the bucket.
> {code}
> "connection": "s3a://mybucket/",
> "config": {
> },
> {code}
> * Specify the credentials and endpoint in the {{core-site.xml}} file:
> {code}
> <configuration>
> <property>
> <name>fs.s3a.access.key</name>
> <value>ACCESS-KEY</value>
> </property>
> <property>
> <name>fs.s3a.secret.key</name>
> <value>SECRET-KEY</value>
> </property>
> <property>
> <name>fs.s3a.endpoint</name>
> <value>s3.REGION.amazonaws.com</value>
> </property>
> </configuration>
> {code}
> ----
> "Point your browser to http://:8047"
> Change to "http://<drill-host>:8047, where <drill-host> is a node on which
> Drill is running."
> "Note: on a single machine system, you'll need to run drill-embedded before
> you can access the web console site"
> The general rule is that Drill must be running, whether embedded, in
> server-mode on the local host, or in a cluster.
> ----
> "Duplicate the 'dfs' plugin."
> This is not necessary. If Drill is local (single server) then it is helpful
> to allow both local and S3 access. But, if Drill is deployed in a cluster,
> local file access is problematic. In short, make this section closer to the
> [HDFS storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.
> Note also that in Drill 1.13, Drill ships with an "s3" storage configuration;
> the user need only enable it. No need to copy/paste the dfs plugin.
> ----
> "you can set this parameter in conf/core-site.xml file in your Drill install
> directory"
> Based on the comments above, change this to: "you can set this parameter in
> core-site.xml"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)