[jira] [Updated] (DRILL-6504) Corrections to S3 storage doc pages

Paul Rogers (JIRA) Sun, 17 Jun 2018 17:17:42 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Rogers updated DRILL-6504:
-------------------------------
       Priority: Major  (was: Minor)
    Description: 
[The documentation for S3 
storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of 
minor errors.

"using the S3a library."

Change to "using the HDFS s3a library." (The library is provided via HDFS, not 
Drill.)

----

"Drill's previous S3n interface"

Change to "the older HDFS s3n library." (Again, S3 support is provided by HDFS.)

----

"Starting with version 1.3.0"

Can probably be removed, 1.3 was quite a long time ago.

----

"To enable Drill's S3a support"

Change to "To enable HDFS s3a support"

----

Include a link to the HDFS S3 documentation: 
[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]

----

Refer to the S3a documentation link above. There are actually multiple ways to 
configure S3a:

* In the storage plugin config (as is suggested by the shipped s3 example in 
the Drill storage page.)
* Using core-site.xml}} as described in the docs.
* Using environment variables set before running Drill or in {{drill-env.sh}}.
* Maybe using the {{~/.aws/credentials}} directory? Have not tested this one.

----

Since Drill does not use HDFS 3.x, Drill dues not support AWS temporary 
credentials as described in the S3a documentation.

----

"edit the file conf/core-site.xml in your Drill install directory,"

Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename 
core-site-example.xml to core-site.xml and ..."

Note: once the file is renamed, it the user had $HADOOP_HOME on their path, 
Hadoop support will break because Drill will pull in the Drill version of 
core-site.xml rather than the Hadoop one. This will cause tools such as 
Drill-on-YARN to fail.

In this situation, the user should make the changes in Hadoop's core-site.xml 
and should not create one for Drill. (In fact, if the user is using Hadoop and 
want to use S3 with Drill, they probably already had S3 support configured...)

In Drill 1.13 (not sure when it was added), the default "s3" storage plugin 
lets the user define the access keys as storage plugin configuration properties:

{code}
  "config": {
    "fs.s3a.access.key": "ID",
    "fs.s3a.secret.key": "SECRET"
  },
{code}

This approach is not very secure, but is probably OK when Drill has a single 
user (such as on a laptop.)

----

When using the above approach, it appears that one must specify the endpoint:

{code}
  "connection": "s3a://<bucket-name>/",
  "config": {
    "fs.s3a.access.key": "<key>",
    "fs.s3a.secret.key": "<key>",
    "fs.s3a.endpoint": "s3.us-west-1.amazonaws.com"
  },
{code}

I could not get the above to work using the pattern in the default S3 config:

{code}
     connection: "s3a://my.bucket.location.com",
{code}

Using the endpoint is how all S3a examples I could find described the usage.

----

"Point your browser to http://:8047";

Change to "http://<drill-host>:8047, where <drill-host> is a node on which 
Drill is running."

"Note: on a single machine system, you'll need to run drill-embedded before you 
can access the web console site"

The general rule is that Drill must be running, whether embedded, in 
server-mode on the local host, or in a cluster.

----

"Duplicate the 'dfs' plugin."

This is not necessary. If Drill is local (single server) then it is helpful to 
allow both local and S3 access. But, if Drill is deployed in a cluster, local 
file access is problematic. In short, make this section closer to the [HDFS 
storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.

Note also that in Drill 1.13, Drill ships with an "s3" storage configuration; 
the user need only enable it. No need to copy/paste the dfs plugin.

----

"you can set this parameter in conf/core-site.xml file in your Drill install 
directory"

Based on the comments above, change this to: "you can set this parameter in 
core-site.xml"

  was:
[The documentation for S3 
storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of 
minor errors.

"using the S3a library."

Change to "using the HDFS s3a library." (The library is provided via HDFS, not 
Drill.)

----

"Drill's previous S3n interface"

Change to "the older HDFS s3n library." (Again, S3 support is provided by HDFS.)

----

"Starting with version 1.3.0"

Can probably be removed, 1.3 was quite a long time ago.

----

"To enable Drill's S3a support"

Change to "To enable HDFS s3a support"

----

Include a link to the HDFS S3 documentation: 
[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]

----

"edit the file conf/core-site.xml in your Drill install directory,"

Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename 
core-site-example.xml to core-site.xml and ..."

Note: once the file is renamed, it the user had $HADOOP_HOME on their path, 
Hadoop support will break because Drill will pull in the Drill version of 
core-site.xml rather than the Hadoop one. This will cause tools such as 
Drill-on-YARN to fail.

In this situation, the user should make the changes in Hadoop's core-site.xml 
and should not create one for Drill. (In fact, if the user is using Hadoop and 
want to use S3 with Drill, they probably already had S3 support configured...)

In Drill 1.13 (not sure when it was added), the default "s3" storage plugin 
lets the user define the access keys as storage plugin configuration properties:

{code}
  "config": {
    "fs.s3a.access.key": "ID",
    "fs.s3a.secret.key": "SECRET"
  },
{code}

This approach is not very secure, but is probably OK when Drill has a single 
user (such as on a laptop.)

----

"Point your browser to http://:8047";

Change to "http://<drill-host>:8047, where <drill-host> is a node on which 
Drill is running."

"Note: on a single machine system, you'll need to run drill-embedded before you 
can access the web console site"

The general rule is that Drill must be running, whether embedded, in 
server-mode on the local host, or in a cluster.

----

"Duplicate the 'dfs' plugin."

This is not necessary. If Drill is local (single server) then it is helpful to 
allow both local and S3 access. But, if Drill is deployed in a cluster, local 
file access is problematic. In short, make this section closer to the [HDFS 
storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.

Note also that in Drill 1.13, Drill ships with an "s3" storage configuration; 
the user need only enable it. No need to copy/paste the dfs plugin.

----

"you can set this parameter in conf/core-site.xml file in your Drill install 
directory"

Based on the comments above, change this to: "you can set this parameter in 
core-site.xml"

        Summary: Corrections to S3 storage doc pages  (was: Typos in S3 storage 
doc pages)

> Corrections to S3 storage doc pages
> -----------------------------------
>
>                 Key: DRILL-6504
>                 URL: https://issues.apache.org/jira/browse/DRILL-6504
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Bridget Bevens
>            Priority: Major
>              Labels: doc-impacting
>
> [The documentation for S3 
> storage|http://drill.apache.org/docs/s3-storage-plugin/] contains a number of 
> minor errors.
> "using the S3a library."
> Change to "using the HDFS s3a library." (The library is provided via HDFS, 
> not Drill.)
> ----
> "Drill's previous S3n interface"
> Change to "the older HDFS s3n library." (Again, S3 support is provided by 
> HDFS.)
> ----
> "Starting with version 1.3.0"
> Can probably be removed, 1.3 was quite a long time ago.
> ----
> "To enable Drill's S3a support"
> Change to "To enable HDFS s3a support"
> ----
> Include a link to the HDFS S3 documentation: 
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html]
> ----
> Refer to the S3a documentation link above. There are actually multiple ways 
> to configure S3a:
> * In the storage plugin config (as is suggested by the shipped s3 example in 
> the Drill storage page.)
> * Using core-site.xml}} as described in the docs.
> * Using environment variables set before running Drill or in {{drill-env.sh}}.
> * Maybe using the {{~/.aws/credentials}} directory? Have not tested this one.
> ----
> Since Drill does not use HDFS 3.x, Drill dues not support AWS temporary 
> credentials as described in the S3a documentation.
> ----
> "edit the file conf/core-site.xml in your Drill install directory,"
> Change to "in the $DRILL_HOME/conf or $DRILL_SITE directory, rename 
> core-site-example.xml to core-site.xml and ..."
> Note: once the file is renamed, it the user had $HADOOP_HOME on their path, 
> Hadoop support will break because Drill will pull in the Drill version of 
> core-site.xml rather than the Hadoop one. This will cause tools such as 
> Drill-on-YARN to fail.
> In this situation, the user should make the changes in Hadoop's core-site.xml 
> and should not create one for Drill. (In fact, if the user is using Hadoop 
> and want to use S3 with Drill, they probably already had S3 support 
> configured...)
> In Drill 1.13 (not sure when it was added), the default "s3" storage plugin 
> lets the user define the access keys as storage plugin configuration 
> properties:
> {code}
>   "config": {
>     "fs.s3a.access.key": "ID",
>     "fs.s3a.secret.key": "SECRET"
>   },
> {code}
> This approach is not very secure, but is probably OK when Drill has a single 
> user (such as on a laptop.)
> ----
> When using the above approach, it appears that one must specify the endpoint:
> {code}
>   "connection": "s3a://<bucket-name>/",
>   "config": {
>     "fs.s3a.access.key": "<key>",
>     "fs.s3a.secret.key": "<key>",
>     "fs.s3a.endpoint": "s3.us-west-1.amazonaws.com"
>   },
> {code}
> I could not get the above to work using the pattern in the default S3 config:
> {code}
>      connection: "s3a://my.bucket.location.com",
> {code}
> Using the endpoint is how all S3a examples I could find described the usage.
> ----
> "Point your browser to http://:8047";
> Change to "http://<drill-host>:8047, where <drill-host> is a node on which 
> Drill is running."
> "Note: on a single machine system, you'll need to run drill-embedded before 
> you can access the web console site"
> The general rule is that Drill must be running, whether embedded, in 
> server-mode on the local host, or in a cluster.
> ----
> "Duplicate the 'dfs' plugin."
> This is not necessary. If Drill is local (single server) then it is helpful 
> to allow both local and S3 access. But, if Drill is deployed in a cluster, 
> local file access is problematic. In short, make this section closer to the 
> [HDFS storage|http://drill.apache.org/docs/file-system-storage-plugin/] page.
> Note also that in Drill 1.13, Drill ships with an "s3" storage configuration; 
> the user need only enable it. No need to copy/paste the dfs plugin.
> ----
> "you can set this parameter in conf/core-site.xml file in your Drill install 
> directory"
> Based on the comments above, change this to: "you can set this parameter in 
> core-site.xml"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6504) Corrections to S3 storage doc pages

Reply via email to