Repository: drill Updated Branches: refs/heads/gh-pages 02af103d9 -> 1b35859d1
1.1 updates Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/1b35859d Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/1b35859d Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/1b35859d Branch: refs/heads/gh-pages Commit: 1b35859d175b39ae2fb64fc25900b54e0e26c181 Parents: 45ada88 Author: Kristine Hahn <[email protected]> Authored: Tue Jul 7 17:13:35 2015 -0700 Committer: Kristine Hahn <[email protected]> Committed: Tue Jul 7 17:14:22 2015 -0700 ---------------------------------------------------------------------- .../035-plugin-configuration-basics.md | 2 +- .../070-hive-storage-plugin.md | 80 +++++++++++--------- .../080-drill-default-input-format.md | 2 +- .../020-hive-to-drill-data-type-mapping.md | 51 +++++-------- _docs/getting-started/010-drill-introduction.md | 2 +- ...20-installing-drill-on-linux-and-mac-os-x.md | 4 +- _docs/sql-reference/sql-commands/079-select.md | 6 +- .../sql-commands/087-union-set-operator.md | 4 +- .../050-aggregate-and-aggregate-statistical.md | 8 +- 9 files changed, 79 insertions(+), 80 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/connect-a-data-source/035-plugin-configuration-basics.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/035-plugin-configuration-basics.md b/_docs/connect-a-data-source/035-plugin-configuration-basics.md index 9f2ba54..b5b6c68 100644 --- a/_docs/connect-a-data-source/035-plugin-configuration-basics.md +++ b/_docs/connect-a-data-source/035-plugin-configuration-basics.md @@ -46,7 +46,7 @@ The following table describes the attributes you configure for storage plugins. </tr> <tr> <td>"connection"</td> - <td>"classpath:///"<br>"file:///"<br>"mongodb://localhost:27017/"<br>"hdfs:///"</td> + <td>"classpath:///"<br>"file:///"<br>"mongodb://localhost:27017/"<br>"hdfs://"</td> <td>implementation-dependent</td> <td>Type of distributed file system, such as HDFS, Amazon S3, or files in your file system.</td> </tr> http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/connect-a-data-source/070-hive-storage-plugin.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/070-hive-storage-plugin.md b/_docs/connect-a-data-source/070-hive-storage-plugin.md index 2c55b37..753aa0d 100644 --- a/_docs/connect-a-data-source/070-hive-storage-plugin.md +++ b/_docs/connect-a-data-source/070-hive-storage-plugin.md @@ -25,56 +25,68 @@ in the Drill Web UI to configure a connection to Drill. To register a remote Hive metastore with Drill, complete the following steps: - 1. Issue the following command to start the Hive metastore service on the system specified in the `hive.metastore.uris`: +1. Issue the following command to start the Hive metastore service on the system specified in the `hive.metastore.uris`: + `hive --service metastore` +2. Navigate to `http://<host>:8047`, and select the **Storage** tab. +3. In the disabled storage plugins section, click **Update** next to the `hive` instance. - hive --service metastore - 2. Navigate to [http://localhost:8047](http://localhost:8047/), and select the **Storage** tab. - 3. In the disabled storage plugins section, click **Update** next to the `hive` instance. - 4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`. - - **Example** - { "type": "hive", - "enabled": true, + "enabled": false, "configProps": { - "hive.metastore.uris": "thrift://<localhost>:<port>", + "hive.metastore.uris": "", + "javax.jdo.option.ConnectionURL": "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true", + "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", + "fs.default.name": "file:///", "hive.metastore.sasl.enabled": "false" } - } - - 5. If you are running Drill and Hive in a secure MapR cluster, remove the following line from the configuration: - `"hive.metastore.sasl.enabled" : "false"` - 6. Click **Enable**. - 7. If you are running Drill and Hive in a secure MapR cluster, add the following line to `<DRILL_HOME>/conf/drill-env.sh` on each Drill node and then [restart the Drillbit service]({{site.baseurl}}/docs/starting-drill-in-distributed-mode/): - ` export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Dmapr_sec_enabled=true -Dhadoop.login=maprsasl -Dzookeeper.saslprovider=com.mapr.security.maprsasl.MaprSaslProvider -Dmapr.library.flatclass"` + } +4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`. + **Example** + + ... + "configProps": { + "hive.metastore.uris": "thrift://<host>:<port>", + ... +5. Change the default location of files to suit your environment, for example, change `"fs.default.name": "file:///"` to one of these locations: + * `hdfs://` + * `hdfs://<authority>:<port>` +6. If you are running Drill and Hive in a secure MapR cluster, remove the following line from the configuration: + `"hive.metastore.sasl.enabled" : "false"` +7. Click **Enable**. +8. If you are running Drill and Hive in a secure MapR cluster, add the following line to `<DRILL_HOME>/conf/drill-env.sh` on each Drill node and then [restart the Drillbit service]({{site.baseurl}}/docs/starting-drill-in-distributed-mode/): + `export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Dmapr_sec_enabled=true -Dhadoop.login=maprsasl -Dzookeeper.saslprovider=com.mapr.security.maprsasl.MaprSaslProvider -Dmapr.library.flatclass"` -Once you have configured a storage plugin instance for a Hive data source, you -can [query Hive tables]({{ site.baseurl }}/docs/querying-hive/). +After configuring a Hive storage plugin, you can [query Hive tables]({{ site.baseurl }}/docs/querying-hive/). ## Hive Embedded Metastore -In this configuration, the Hive metastore is embedded within the Drill process. Provide the metastore database configuration settings in the Drill Web UI. Before you register Hive, verify that the driver you use to connect to the Hive metastore is in the Drill classpath located in `/<drill installation directory>/lib/.` If the driver is not there, copy the driver to `/<drill +In this configuration, the Hive metastore is embedded within the Drill process. Configure an embedded metastore only in a cluster that runs a single Drillbit and only for testing purposes. Do not embed the Hive metastore in production systems. + +Provide the metastore database configuration settings in the Drill Web UI. Before you register Hive, verify that the driver you use to connect to the Hive metastore is in the Drill classpath located in `/<drill installation directory>/lib/.` If the driver is not there, copy the driver to `/<drill installation directory>/lib` on the Drill node. For more information about storage types and configurations, refer to ["Hive Metastore Administration"](https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin). To register an embedded Hive metastore with Drill, complete the following steps: - 1. Navigate to [http://localhost:8047](http://localhost:8047/), and select the **Storage** tab - 2. In the disabled storage plugins section, click **Update** next to `hive` instance. - 3. In the configuration window, add the database configuration settings. +1. Navigate to `http://<host>:8047`, and select the **Storage** tab. +2. In the disabled storage plugins section, click **Update** next to `hive` instance. +3. In the configuration window, add the database configuration settings. - **Example** - - { - "type": "hive", - "enabled": true, - "configProps": { - "javax.jdo.option.ConnectionURL": "jdbc:<database>://<host:port>/<metastore database>;create=true", - "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", - "fs.default.name": "file:///", + **Example** + + { + "type": "hive", + "enabled": false, + "configProps": { + "hive.metastore.uris": "", + "javax.jdo.option.ConnectionURL": "jdbc:<database>://<host:port>/<metastore database>", + "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", + "fs.default.name": "file:///", + "hive.metastore.sasl.enabled": "false" + } } - } - 4. Click **Enable**. +5. Change the `"fs.default.name":` attribute to specify the default location of files. The value needs to be a URI that is available and capable of handling filesystem requests. For example, change the local file system URI `"file:///"` to the HDFS URI: `hdfs://`, or to the path on HDFS with a namenode: `hdfs://<authority>:<port>` +6. Click **Enable**. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/connect-a-data-source/080-drill-default-input-format.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/080-drill-default-input-format.md b/_docs/connect-a-data-source/080-drill-default-input-format.md index cc25959..f09f1df 100644 --- a/_docs/connect-a-data-source/080-drill-default-input-format.md +++ b/_docs/connect-a-data-source/080-drill-default-input-format.md @@ -46,7 +46,7 @@ steps: { "type": "file", "enabled": true, - "connection": "hdfs:///", + "connection": "hdfs://", "workspaces": { "root": { "location": "/drill/testdata", http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md b/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md index 76b7640..ffda83e 100644 --- a/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md +++ b/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md @@ -6,26 +6,24 @@ Using Drill you can read tables created in Hive that use data types compatible w <!-- See DRILL-1570 --> -| Supported SQL Type | Hive Type | Description | -|--------------------|-----------|------------------------------------------------------------| -| BIGINT | BIGINT | 8-byte signed integer | -| BOOLEAN | BOOLEAN | TRUE (1) or FALSE (0) | -| BYTE | TINYINT | 1-byte integer | -| CHAR | CHAR | Character string, fixed-length max 255 | -| DATE | DATE | Years months and days in the form in the form YYYY-ÂMM-ÂDD | -| DECIMAL* | DECIMAL | 38-digit precision | -| FLOAT | FLOAT | 4-byte single precision floating point number | -| DOUBLE | DOUBLE | 8-byte double precision floating point number | -| INT or INTEGER | INT | 4-byte signed integer | -| INTERVALDAY | N/A | Integer fields representing a day | -| INTERVALYEAR | N/A | Integer fields representing a year | -| SMALLINT | SMALLINT | 2-byte signed integer | -| TIME | N/A | Hours minutes seconds 24-hour basis | -| N/A | TIMESTAMP | Conventional UNIX Epoch timestamp. | -| TIMESTAMP | TIMESTAMP | JDBC timestamp in yyyy-mm-dd hh:mm:ss format | -| None | STRING | Binary string (16) | -| VARCHAR | VARCHAR | Character string variable length | -| VARBINARY | BINARY | Binary string | +| Supported SQL Type | Hive Type | Description | +|--------------------|-------------------------|------------------------------------------------------------| +| BIGINT | BIGINT | 8-byte signed integer | +| BOOLEAN | BOOLEAN | TRUE (1) or FALSE (0) | +| VARCHAR | CHAR | Character string, fixed-length max 255 | +| DATE | DATE | Years months and days in the form in the form YYYY-ÂMM-ÂDD | +| DECIMAL* | DECIMAL | 38-digit precision | +| FLOAT | FLOAT | 4-byte single precision floating point number | +| DOUBLE | DOUBLE | 8-byte double precision floating point number | +| INTEGER | INT, TINYINT, SMALLINT | 1-, 2-, or 4-byte signed integer | +| INTERVALDAY | N/A | Integer fields representing a day | +| INTERVALYEAR | N/A | Integer fields representing a year | +| TIME | N/A | Hours minutes seconds 24-hour basis | +| N/A | TIMESTAMP | Conventional UNIX Epoch timestamp. | +| TIMESTAMP | TIMESTAMP | JDBC timestamp in yyyy-mm-dd hh:mm:ss format | +| None | STRING | Binary string (16) | +| VARCHAR | VARCHAR | Character string variable length | +| VARBINARY | BINARY | Binary string | \* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the `planner.enable_decimal_data_type` option to `true`. @@ -80,18 +78,7 @@ You check that Hive mapped the data from the CSV to the typed values as as expec ### Connect Drill to Hive and Query the Data -In Drill, you use the Hive storage plugin that has the following definition. - - { - "type": "hive", - "enabled": true, - "configProps": { - "hive.metastore.uris": "thrift://localhost:9083", - "hive.metastore.sasl.enabled": "false" - } - } - -Using the Hive storage plugin connects Drill to the Hive metastore containing the data. +In Drill, you use the [Hive storage plugin]({{site.baseurl}}/docs/hive-storage-plugin). Using the Hive storage plugin connects Drill to the Hive metastore containing the data. 0: jdbc:drill:> USE hive; +------------+------------+ http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/getting-started/010-drill-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/getting-started/010-drill-introduction.md b/_docs/getting-started/010-drill-introduction.md index fd06770..1ccfb31 100644 --- a/_docs/getting-started/010-drill-introduction.md +++ b/_docs/getting-started/010-drill-introduction.md @@ -16,7 +16,7 @@ Many enhancements in Apache Drill 1.1 include the following key features: * [SQL window functions]({{site.baseurl}}/docs/sql-window-functions) * [Automatic partitioning]({{site.baseurl}}) using the new [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause) clause in the CTAS command * [Delegated Hive impersonation](({{site.baseurl}}/docs/configuring-user-impersonation-with-hive-authorization/) -* Support for UNION ALL and better optimized plans that include UNION. +* Support for UNION and UNION ALL and better optimized plans that include UNION. ## What's New in Apache Drill 1.0 http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md ---------------------------------------------------------------------- diff --git a/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md b/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md index bb3b349..0470a96 100755 --- a/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md +++ b/_docs/install/installing-drill-in-embedded-mode/020-installing-drill-on-linux-and-mac-os-x.md @@ -6,9 +6,9 @@ First, check that you [meet the prerequisites]({{site.baseurl}}/docs/embedded-mo Complete the following steps to install Drill: -1. In a terminal windows, change to the directory where you want to install Drill. +1. In a terminal window, change to the directory where you want to install Drill. -2. To download the latest version of Apache Drill, download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz) or run one of the following commands, depending on which you have installed on your system: +2. To get the latest version of Apache Drill, download Drill from the [Drill web site](http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz) or run one of the following commands, depending on which you have installed on your system: * `wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz` * `curl -o apache-drill-1.1.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz` http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/sql-reference/sql-commands/079-select.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/079-select.md b/_docs/sql-reference/sql-commands/079-select.md index 3e20134..ff502aa 100755 --- a/_docs/sql-reference/sql-commands/079-select.md +++ b/_docs/sql-reference/sql-commands/079-select.md @@ -10,9 +10,9 @@ Drill supports the following ANSI standard clauses in the SELECT statement: * WHERE clause * GROUP BY clause * HAVING clause - * UNION ALL set operator - * ORDER BY clause (with an optional LIMIT clause) - * Limit clause + * UNION and UNION ALL set operators + * ORDER BY clause (with an optional LIMIT clause) + * Limit clause * Offset clause You can use the same SELECT syntax in the following commands: http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/sql-reference/sql-commands/087-union-set-operator.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-commands/087-union-set-operator.md b/_docs/sql-reference/sql-commands/087-union-set-operator.md index 4c78c33..b672957 100644 --- a/_docs/sql-reference/sql-commands/087-union-set-operator.md +++ b/_docs/sql-reference/sql-commands/087-union-set-operator.md @@ -21,10 +21,10 @@ Any SELECT query that Drill supports. See [SELECT]({{site.baseurl}}/docs/select/ ## Usage Notes * The two SELECT query expressions that represent the direct operands of the UNION must produce the same number of columns. Corresponding columns must contain compatible data types. See [Supported Data Types]({{site.baseurl}}/docs/supported-data-types/). * Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise indicated by parentheses. - * You can only use * on either side of UNION when the data source has a defined schema, such as data in Hive or views. + * You can only use * on either side of UNION when the data source has a defined schema, such as data in Hive or views. * You must explicitly specify columns. -## Examples +## Example The following example uses the UNION ALL set operator to combine click activity data before and after a marketing campaign. The data in the example exists in the `dfs.clicks workspace`. 0: jdbc:drill:> SELECT t.trans_id transaction, t.user_info.cust_id customer http://git-wip-us.apache.org/repos/asf/drill/blob/1b35859d/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md index 4e0566b..cf76b5e 100644 --- a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md +++ b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md @@ -19,11 +19,11 @@ SUM(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVALDAY, \* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. You can [enable the DECIMAL type](docs/supported-data-types/#enabling-the-decimal-type), but this is not recommended. -MIN, MAX, COUNT, AVG, and SUM accept ALL and DISTINCT keywords. The default is ALL. +AVG, COUNT, MIN, MAX, and SUM accept ALL and DISTINCT keywords. The default is ALL. ## AVG -Returns the average of all records of a column or the average of groups of records. +Averages a column of all records in a data source. Averages a column of one or more groups of records. Which records to include in the calculation can be based on a condition. ### Syntax @@ -61,7 +61,7 @@ Expressions listed within the AVG function and must be included in the GROUP BY +----------------------+---------------------+ 5 rows selected (0.495 seconds) -## MIN, MAX, COUNT, and SUM +## COUNT, MIN, MAX, and SUM ### Examples @@ -152,4 +152,4 @@ Drill provides following aggregate statistics functions: * var_pop(expression) * var_samp(expression) -These functions take a SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, or DECIMAL expression as the argument. If the expression is FLOAT, the function returns DOUBLE; otherwise, the function returns DECIMAL. \ No newline at end of file +These functions take a SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, or DECIMAL expression as the argument. If the expression is FLOAT, the function returns DOUBLE; otherwise, the function returns DECIMAL.
