Repository: kudu Updated Branches: refs/heads/master 4cd6338e6 -> 0a37d1f3b
[doc] Document the new decimal column type Change-Id: I9489613d35daad708648ea04d49e472d3149b33d Reviewed-on: http://gerrit.cloudera.org:8080/9432 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/0a37d1f3 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/0a37d1f3 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/0a37d1f3 Branch: refs/heads/master Commit: 0a37d1f3be2ec08a3e295b03e5907a7d878eb753 Parents: 4cd6338 Author: Grant Henke <[email protected]> Authored: Mon Feb 19 15:50:06 2018 -0600 Committer: Grant Henke <[email protected]> Committed: Tue Mar 13 02:19:46 2018 +0000 ---------------------------------------------------------------------- docs/developing.adoc | 4 +-- docs/known_issues.adoc | 4 ++- docs/kudu_impala_integration.adoc | 3 +-- docs/release_notes.adoc | 13 +++++++++ docs/schema_design.adoc | 49 +++++++++++++++++++++++++++++++--- 5 files changed, 65 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/0a37d1f3/docs/developing.adoc ---------------------------------------------------------------------- diff --git a/docs/developing.adoc b/docs/developing.adoc index eb8b2c6..09ffb82 100644 --- a/docs/developing.adoc +++ b/docs/developing.adoc @@ -180,8 +180,8 @@ name and keytab location must be provided through the `--principal` and - `<>` and `OR` predicates are not pushed to Kudu, and instead will be evaluated by the Spark task. Only `LIKE` predicates with a suffix wildcard are pushed to Kudu, meaning that `LIKE "FOO%"` is pushed down but `LIKE "FOO%BAR"` isn't. -- Kudu does not support all types supported by Spark SQL, such as `Date`, - `Decimal` and complex types. +- Kudu does not support every type supported by Spark SQL. For example, + `Date` and complex types are not supported. - Kudu tables may only be registered as temporary tables in SparkSQL. Kudu tables may not be queried using HiveContext. http://git-wip-us.apache.org/repos/asf/kudu/blob/0a37d1f3/docs/known_issues.adoc ---------------------------------------------------------------------- diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc index 40e1c77..cd9dd05 100644 --- a/docs/known_issues.adoc +++ b/docs/known_issues.adoc @@ -51,10 +51,12 @@ === Columns -* DECIMAL, CHAR, VARCHAR, DATE, and complex types like ARRAY are not supported. +* CHAR, VARCHAR, DATE, and complex types like ARRAY are not supported. * Type and nullability of existing columns cannot be changed by altering the table. +* The precision and scale of `DECIMAL` columns cannot be changed by altering the table. + * Tables can have a maximum of 300 columns. === Tables http://git-wip-us.apache.org/repos/asf/kudu/blob/0a37d1f3/docs/kudu_impala_integration.adoc ---------------------------------------------------------------------- diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc index 9d2e7b0..a43b7ff 100755 --- a/docs/kudu_impala_integration.adoc +++ b/docs/kudu_impala_integration.adoc @@ -735,8 +735,7 @@ The examples above have only explored a fraction of what you can do with Impala to work around this issue. - When creating a Kudu table, the `CREATE TABLE` statement must include the primary key columns before other columns, in primary key order. -- Impala can not create Kudu tables with `DECIMAL`, `VARCHAR`, - or nested-typed columns. +- Impala can not create Kudu tables with `VARCHAR` or nested-typed columns. - Impala cannot update values in primary key columns. - `!=` and `LIKE` predicates are not pushed to Kudu, and instead will be evaluated by the Impala scan node. This may decrease performance http://git-wip-us.apache.org/repos/asf/kudu/blob/0a37d1f3/docs/release_notes.adoc ---------------------------------------------------------------------- diff --git a/docs/release_notes.adoc b/docs/release_notes.adoc index 6e1dc2b..916282c 100644 --- a/docs/release_notes.adoc +++ b/docs/release_notes.adoc @@ -50,6 +50,13 @@ [[rn_1.7.0_new_features]] == New features +* Kudu now supports the decimal column type. The decimal type is a numeric data type + with fixed scale and precision suitable for financial and other arithmetic + calculations where the imprecise representation and rounding behavior of float and + double make those types impractical. The decimal type is also useful for integers + larger than int64 and cases with fractional values in a primary key. + See link:schema_design.html#decimal[Decimal Type] for more details. + * The `kudu fs update_dirs` tool now supports removing directories. Unless the `--force` flag is specified, Kudu will not allow the removal of a directory across which tablets are configured to spread data. If specified, all tablet @@ -122,6 +129,12 @@ on wire compatibility between Kudu 1.7 and versions earlier than 1.3: written against Kudu 1.6 will continue to run against the Kudu 1.7 client and vice-versa. +* Kudu 1.7 clients that attempt to create a table with a decimal column on a + target server running Kudu 1.6 or earlier will receive an error response. + Similarly Kudu clients running Kudu 1.6 or earlier will result in an error + when attempting to access any table containing containing a decimal + column. + [[rn_1.7.0_known_issues]] == Known Issues and Limitations http://git-wip-us.apache.org/repos/asf/kudu/blob/0a37d1f3/docs/schema_design.adoc ---------------------------------------------------------------------- diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 7f0e218..02d05ea 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -73,6 +73,7 @@ column types include: * unixtime_micros (64-bit microseconds since the Unix epoch) * single-precision (32-bit) IEEE-754 floating-point number * double-precision (64-bit) IEEE-754 floating-point number +* decimal (see <<decimal>> for details) * UTF-8 encoded string (up to 64KB uncompressed) * binary (up to 64KB uncompressed) @@ -90,6 +91,48 @@ Unlike HBase, Kudu does not provide a version or timestamp column to track chang to a row. If version or timestamp information is needed, the schema should include an explicit version or timestamp column. +[[decimal]] +=== Decimal Type + +The `decimal` type is a numeric data type with fixed scale and precision suitable for +financial and other arithmetic calculations where the imprecise representation and +rounding behavior of `float` and `double` make those types impractical. The `decimal` +type is also useful for integers larger than int64 and cases with fractional values +in a primary key. + +The `decimal` type is a parameterized type that takes precision and scale type +attributes. + +*Precision* represents the total number of digits that can be represented by the +column, regardless of the location of the decimal point. This value must be between +1 and 38 and has no default. For example, a precision of 4 is required to represent +integer values up to 9999, or to represent values up to 99.99 with two fractional +digits. You can also represent corresponding negative values, without any +change in the precision. For example, the range -9999 to 9999 still only requires +a precision of 4. + +*Scale* represents the number of fractional digits. This value must be between 0 +and the precision. A scale of 0 produces integral values, with no fractional part. +If precision and scale are equal, all of the digits come after the decimal point. +For example, a decimal with precision and scale equal to 3 can represent values +between -0.999 and 0.999. + +*Performance considerations:* + +Kudu stores each value in as few bytes as possible depending on the precision +specified for the decimal column. For that reason it is not advised to just use +the highest precision possible for convenience. Doing so could negatively impact +performance, memory and storage. + +Before encoding and compression: + +* Decimal values with precision of 9 or less are stored in 4 bytes. +* Decimal values with precision of 10 through 18 are stored in 8 bytes. +* Decimal values with precision greater than 18 are stored in 16 bytes. + +NOTE: The precision and scale of `decimal` columns cannot be changed by altering +the table. + [[encoding]] === Column Encoding @@ -102,7 +145,7 @@ of the column. | Column Type | Encoding | Default | int8, int16, int32 | plain, bitshuffle, run length | bitshuffle | int64, unixtime_micros | plain, bitshuffle, run length | bitshuffle -| float, double | plain, bitshuffle | bitshuffle +| float, double, decimal | plain, bitshuffle | bitshuffle | bool | plain, run length | run length | string, binary | plain, prefix, dictionary | dictionary |=== @@ -160,8 +203,8 @@ Like an RDBMS primary key, the Kudu primary key enforces a uniqueness constraint Attempting to insert a row with the same primary key values as an existing row will result in a duplicate key error. -Primary key columns must be non-nullable, and may not be a boolean or floating- -point type. +Primary key columns must be non-nullable, and may not be a boolean, float +or double type. Once set during table creation, the set of columns in the primary key may not be altered.
