decimal data type disabled
Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/f58d360e Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/f58d360e Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/f58d360e Branch: refs/heads/gh-pages Commit: f58d360e43331825ea7e9885c167078e2a1c4ff1 Parents: c38e6a1 Author: Kristine Hahn <[email protected]> Authored: Thu May 14 14:40:08 2015 -0700 Committer: Kristine Hahn <[email protected]> Committed: Thu May 14 14:40:08 2015 -0700 ---------------------------------------------------------------------- .../020-hive-to-drill-data-type-mapping.md | 20 ++++++++++++++++---- .../040-parquet-format.md | 4 +++- .../050-json-data-model.md | 8 ++++---- _docs/getting-started/020-why-drill.md | 2 +- .../040-tableau-examples.md | 2 +- .../060-querying-the-information-schema.md | 1 + .../005-querying-complex-data-introduction.md | 2 +- .../data-types/010-supported-data-types.md | 15 ++++++++++++++- .../030-handling-different-data-types.md | 9 ++++----- .../050-aggregate-and-aggregate-statistical.md | 2 ++ 10 files changed, 47 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md b/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md index ed66a7a..20a921f 100644 --- a/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md +++ b/_docs/data-sources-and-file-formats/020-hive-to-drill-data-type-mapping.md @@ -11,8 +11,8 @@ Using Drill you can read tables created in Hive that use data types compatible w | BIGINT | BIGINT | 8-byte signed integer | | BOOLEAN | BOOLEAN | TRUE (1) or FALSE (0) | | CHAR | CHAR | Character string, fixed-length max 255 | -| DATE | DATE | Years months and days in the form in the form YYYY-ÂMM-ÂDD | -| DECIMAL | DECIMAL | 38-digit precision | +| DATE | DATE | Years months and days in the form in the form YYYY-ÂMM-ÂDD | +| DECIMAL* | DECIMAL | 38-digit precision | | FLOAT | FLOAT | 4-byte single precision floating point number | | DOUBLE | DOUBLE | 8-byte double precision floating point number | | INT or INTEGER | INT | 4-byte signed integer | @@ -25,6 +25,8 @@ Using Drill you can read tables created in Hive that use data types compatible w | None | STRING | Binary string (16) | | VARCHAR | VARCHAR | Character string variable length | +\* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the `planner.enable_decimal_data_type` option to `true`. + ## Unsupported Types Drill does not support the following Hive types: @@ -41,8 +43,14 @@ This example demonstrates the mapping of Hive data types to Drill data types. Us 8223372036854775807,true,3.5,-1231.4,3.14,42,"SomeText",2015-03-25,2015-03-25 01:23:15 -The example assumes that the CSV resides on the MapR file system (MapRFS) in the Drill sandbox: `/mapr/demo.mapr.com/data/` - +### Example Assumptions +The example makes the following assumptions: + +* The CSV resides on the MapR file system (MapRFS) in the Drill sandbox: `/mapr/demo.mapr.com/data/` +* You [enabled the DECIMAL data type]({{site.baseurl}}/docs/supported-data-types#enabling-the-decimal-type) in Drill. + +### Define an External Table in Hive + In Hive, you define an external table using the following query: hive> CREATE EXTERNAL TABLE types_demo ( @@ -59,6 +67,8 @@ In Hive, you define an external table using the following query: LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/mapr/demo.mapr.com/data/mytypes.csv'; +\* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the `planner.enable_decimal_data_type` option to `true`. + You check that Hive mapped the data from the CSV to the typed values as as expected: hive> SELECT * FROM types_demo; @@ -66,6 +76,8 @@ You check that Hive mapped the data from the CSV to the typed values as as expec 8223372036854775807 true 3.5 -1231.4 3.14 42 "SomeText" 2015-03-25 2015-03-25 01:23:15 Time taken: 0.524 seconds, Fetched: 1 row(s) +### Connect Drill to Hive and Query the Data + In Drill, you use the Hive storage plugin that has the following definition. { http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/data-sources-and-file-formats/040-parquet-format.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/040-parquet-format.md b/_docs/data-sources-and-file-formats/040-parquet-format.md index 55c4e68..cd14359 100644 --- a/_docs/data-sources-and-file-formats/040-parquet-format.md +++ b/_docs/data-sources-and-file-formats/040-parquet-format.md @@ -144,11 +144,13 @@ Parquet also supports logical types, fully described on the [Apache Parquet site | None | | UINT_16 | 16 bits, unsigned | | None | | UINT_32 | 32 bits, unsigned | | None | | UINT_64 | 64 bits, unsigned | -| DECIMAL | 38-digit precision | DECIMAL | Arbitrary-precision signed decimal numbers of the form unscaledValue * 10^(-scale) | +| DECIMAL* | 38-digit precision | DECIMAL | Arbitrary-precision signed decimal numbers of the form unscaledValue * 10^(-scale) | | TIME | Hours, minutes, seconds, milliseconds; 24-hour basis | TIME_MILLIS | Logical time, not including the date. Annotates int32. Number of milliseconds after midnight. | | TIMESTAMP | Year, month, day, and seconds | TIMESTAMP_MILLIS | Logical date and time. Annotates an int64 that stores the number of milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC. | | INTERVALDAY and INTERVALYEAR | Integer fields representing a period of time depending on the type of interval | INTERVAL | An interval of time. Annotates a fixed_len_byte_array of length 12. Months, days, and ms in unsigned little-endian format. | +\* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the `planner.enable_decimal_data_type` option to `true`. + ## Data Description Language Support Parquet supports the following data description languages: http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/data-sources-and-file-formats/050-json-data-model.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/050-json-data-model.md b/_docs/data-sources-and-file-formats/050-json-data-model.md index 28ab921..82b8a3a 100644 --- a/_docs/data-sources-and-file-formats/050-json-data-model.md +++ b/_docs/data-sources-and-file-formats/050-json-data-model.md @@ -40,11 +40,11 @@ The following table shows SQL-JSON data type mapping: By default, Drill does not support JSON lists of different types. For example, JSON does not enforce types or distinguish between integers and floating point values. When reading numerical values from a JSON file, Drill distinguishes integers from floating point numbers by the presence or lack of a decimal point. If some numbers in a JSON map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error. You use the following options to read JSON lists of different types: * `store.json.read_numbers_as_double` - Reads numbers from JSON files with or without a decimal point as DOUBLE. -* `store.json.all_text_mode` - Reads all data from JSON files as VARCHAR. + Reads numbers from JSON files with or without a decimal point as DOUBLE. You need to cast numbers from VARCHAR to numerical data types, such as DOUBLE or INTEGER. +* `store.json.all_text_mode` + Reads all data from JSON files as VARCHAR. you need to cast numbers from DOUBLE to other numerical types only if you cannot use the numbers as DOUBLE. -The following session/system options for `store.json.all_text_mode` and `store.json.read_numbers_as_double` options is false. Enable the latter if the JSON contains integers and floating point numbers. Using either option prevents schema errors, but using `store.json.read_numbers_as_double` has an advantage over `store.json.all_text_mode`: You do not have to cast every number from VARCHAR to DOUBLE or BIGINT when you query the JSON file. +The default setting of `store.json.all_text_mode` and `store.json.read_numbers_as_double` options is false. Using either option prevents schema errors, but using `store.json.read_numbers_as_double` has an advantage over `store.json.all_text_mode`. Using `store.json.read_numbers_as_double` typically involves less explicit casting than using `store.json.all_text_mode` because you can often use the numerical data as is -\-DOUBLE. ### Handling Type Differences Set the `store.json.read_numbers_as_double` property to true. http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/getting-started/020-why-drill.md ---------------------------------------------------------------------- diff --git a/_docs/getting-started/020-why-drill.md b/_docs/getting-started/020-why-drill.md index f7f4495..d00d882 100644 --- a/_docs/getting-started/020-why-drill.md +++ b/_docs/getting-started/020-why-drill.md @@ -39,7 +39,7 @@ Drill's schema-free JSON model allows you to query complex, semi-structured data ## 4. Real SQL -- not "SQL-like" -Drill supports the standard SQL:2003 syntax. No need to learn a new "SQL-like" language or struggle with a semi-functional BI tool. Drill supports many data types including DATE, INTERVALDAY/INTERVALYEAR, TIMESTAMP, VARCHAR and DECIMAL, as well as complex query constructs such as correlated sub-queries and joins in WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill "as is": +Drill supports the standard SQL:2003 syntax. No need to learn a new "SQL-like" language or struggle with a semi-functional BI tool. Drill supports many data types including DATE, INTERVALDAY/INTERVALYEAR, TIMESTAMP, and VARCHAR, as well as complex query constructs such as correlated sub-queries and joins in WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill "as is": ### TPC-H query 4 http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/040-tableau-examples.md ---------------------------------------------------------------------- diff --git a/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/040-tableau-examples.md b/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/040-tableau-examples.md index 85e8c88..9b73096 100644 --- a/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/040-tableau-examples.md +++ b/_docs/odbc-jdbc-interfaces/using-odbc-on-windows/040-tableau-examples.md @@ -14,7 +14,7 @@ This section includes the following examples: The steps and results of these examples assume pre-configured schemas and source data. You configure schemas as storage plugin instances on the Storage -tab of the [Drill Web UI]({{ site.baseurl }}/docs/getting-to-know-the-drill-sandbox#storage-plugin-overview). +tab of the [Drill Web UI]({{ site.baseurl }}/docs/getting-to-know-the-drill-sandbox#storage-plugin-overview). Also, the examples assume you [enabled the DECIMAL data type]({{site.baseurl}}/docs/supported-data-types#enabling-the-decimal-type) in Drill. ## Example: Connect to a Hive Table in Tableau http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/query-data/060-querying-the-information-schema.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/060-querying-the-information-schema.md b/_docs/query-data/060-querying-the-information-schema.md index 590fad2..fddb194 100644 --- a/_docs/query-data/060-querying-the-information-schema.md +++ b/_docs/query-data/060-querying-the-information-schema.md @@ -107,3 +107,4 @@ of those columns: | OrderTotal | Decimal | +-------------+------------+ +In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. [Enable the DECIMAL data type]({{site.baseurl}}/docs/supported-data-types#enabling-the-decimal-type)) if performance is not an issue. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md b/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md index 2f59219..a6a8c84 100644 --- a/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md +++ b/_docs/query-data/querying-complex-data/005-querying-complex-data-introduction.md @@ -43,7 +43,7 @@ The examples in this section operate on JSON data files. In order to write your own queries, you need to be aware of the basic data types in these files: * string (all data inside double quotes), such as `"0001"` or `"Cake"` - * number: integers, decimals, and floats, such as `0.55` or `10` + * number: integers and floats, such as `0.55` or `10` * null values * boolean values: true, false http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/sql-reference/data-types/010-supported-data-types.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/data-types/010-supported-data-types.md b/_docs/sql-reference/data-types/010-supported-data-types.md index 5641dc4..c588a8b 100644 --- a/_docs/sql-reference/data-types/010-supported-data-types.md +++ b/_docs/sql-reference/data-types/010-supported-data-types.md @@ -22,10 +22,23 @@ Drill reads from and writes to data sources having a wide variety of types. Dril | CHARACTER VARYING, CHARACTER, CHAR, or VARCHAR*** | UTF8-encoded variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647. | CHAR(30) casts data to a 30-character string maximum. | -\* In this release, the NUMERIC data type is an alias for the DECIMAL data type. +\* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. The NUMERIC data type is an alias for the DECIMAL data type. \*\* Not currently supported. \*\*\* Currently, Drill supports only variable-length strings. +## Enabling the DECIMAL Type + +To enable the DECIMAL type, set the `planner.enable_decimal_data_type` option to `true`. Enable the DECIMAL data type if performance is not an issue. + + ALTER SYSTEM SET `planner.enable_decimal_data_type` = true; + + +------------+------------+ + | ok | summary | + +------------+------------+ + | true | planner.enable_decimal_data_type updated. | + +------------+------------+ + 1 row selected (1.191 seconds) + ## Casting and Converting Data Types In Drill, you cast or convert data to the required type for moving data from one data source to another or to make the data readable. http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/sql-reference/data-types/030-handling-different-data-types.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/data-types/030-handling-different-data-types.md b/_docs/sql-reference/data-types/030-handling-different-data-types.md index 64121dd..110d32d 100644 --- a/_docs/sql-reference/data-types/030-handling-different-data-types.md +++ b/_docs/sql-reference/data-types/030-handling-different-data-types.md @@ -11,8 +11,7 @@ In a textual file, such as CSV, Drill interprets every field as a VARCHAR, as pr ## Handling JSON and Parquet Data Complex and nested data structures in JSON and Parquet files are of map and array types. -A map is a set of name/value pairs. A value in a map can be a scalar type, such as string or int, or a complex type, such as an array or another map. -An array is a repeated list of values. A value in an array can be a scalar type, such as string or int, or an array can be a complex type, such as a map or another array. +A map is a set of name/value pairs. A value in a map can be a scalar type, such as string or int, or a complex type, such as an array or another map. An array is a repeated list of values. A value in an array can be a scalar type, such as string or int, or an array can be a complex type, such as a map or another array. Drill reads/writes maps and arrays from/to JSON and Parquet files. In Drill, you do not cast a map or array to another type. @@ -58,15 +57,15 @@ The following example shows a JSON array having complex type values: ## Reading numbers of different types from JSON -The `store.json.read_numbers_as_double` and `store.json.all_text_mode` system/session options control how Drill implicitly casts JSON data. By default, when reading numerical values from a JSON file, Drill implicitly casts a number to the DOUBLE or BIGINT type depending on the presence or absence a decimal point. If some numbers in a JSON map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error. By default, Drill reads numbers without decimal point as BIGINT values by default. The range of BIGINT is -9223372036854775808 to 9223372036854775807. A BIGINT result outside this range produces an error. +The `store.json.read_numbers_as_double` and `store.json.all_text_mode` system/session options control how Drill implicitly casts JSON data. By default, when reading numerical values from a JSON file, Drill implicitly casts a number to the DOUBLE or BIGINT type depending on the presence or absence of a decimal point. If some numbers in a JSON map or array appear with and without a decimal point, such as 0 and 0.0, Drill throws a schema change error. By default, Drill reads numbers without decimal point as BIGINT values. The range of BIGINT is -9223372036854775808 to 9223372036854775807. A BIGINT result outside this range produces an error. -To prevent Drill from attempting to read such data, set `store.json.read_numbers_as_double` or `store.json.all_text_mode` to true. Using `store.json.all_text_mode` set to true, Drill implicitly casts JSON data to VARCHAR. You need to cast the VARCHAR values to other types you want the returned data to represent. Using `store.json.read_numbers_as_double` set to true, Drill casts numbers in the JSON file to DOUBLE. You need to cast the DOUBLE to any other types of numbers, such as FLOAT and INTEGER, you want the returned data to represent. Using `store.json.read_numbers_as_double` typically involves less casting on your part than using `store.json.all_text_mode`. +To prevent Drill from attempting to read such data, set `store.json.read_numbers_as_double` or `store.json.all_text_mode` to true. Using `store.json.all_text_mode` set to true, Drill implicitly casts JSON data to VARCHAR. You need to cast the VARCHAR values to other types. Using `store.json.read_numbers_as_double` set to true, Drill implicitly casts numbers in the JSON file to DOUBLE. You need to cast the DOUBLE type to other types, such as FLOAT and INTEGER. Using `store.json.read_numbers_as_double` typically involves less explicit casting than using `store.json.all_text_mode` because you can often use the numerical data as is (DOUBLE). ## Guidelines for Using Float and Double FLOAT and DOUBLE yield approximate results. These are variable-precision numeric types. Drill does not cast/convert all values precisely to the internal format, but instead stores approximations. Slight differences can occur in the value stored and retrieved. The following guidelines are recommended: -* For conversions involving monetary calculations, for example, that require precise results use the decimal type instead of float or double. +* For conversions involving monetary calculations, for example, that require precise results use the DECIMAL type instead of FLOAT or DOUBLE. In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. [Enable the DECIMAL data type]({{site.baseurl}}/docs/supported-data-types#enabling-the-decimal-type)) if performance is not an issue. * For complex calculations or mission-critical applications, especially those involving infinity and underflow situations, carefully consider the limitations of type casting that involves FLOAT or DOUBLE. * Equality comparisons between floating-point values can produce unexpected results. http://git-wip-us.apache.org/repos/asf/drill/blob/f58d360e/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md ---------------------------------------------------------------------- diff --git a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md index d6e66de..21e6e40 100644 --- a/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md +++ b/_docs/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md @@ -17,6 +17,8 @@ MAX(expression)| BINARY, DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP| same as arg MIN(expression)| BINARY, DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP| same as argument type SUM(expression)| SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVALDAY, or INTERVALYEAR| BIGINT for SMALLINT or INTEGER arguments, DECIMAL for BIGINT arguments, DOUBLE for floating-point arguments, otherwise the same as the argument data type +\* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. [Enable the DECIMAL data type]({{site.baseurl}}/docs/supported-data-types#enabling-the-decimal-type)) if performance is not an issue. + MIN, MAX, COUNT, AVG, and SUM accept ALL and DISTINCT keywords. The default is ALL. ### Examples
