This is an automated email from the ASF dual-hosted git repository. granthenke pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 386cc74c1a15d10e85989b67e92cfe3d6b134f44 Author: Grant Henke <[email protected]> AuthorDate: Sun Apr 19 10:39:29 2020 -0500 [docs] Update schema documentation This patch adds more details on the VARCHAR type to the schema docs. It also adds the DATE type and includes a small update to remove the explicit Hbase call out. Change-Id: I681e0af517b08c348420b3b217c393797717d3fc Reviewed-on: http://gerrit.cloudera.org:8080/15757 Tested-by: Kudu Jenkins Reviewed-by: Volodymyr Verovkin <[email protected]> Reviewed-by: Hao Hao <[email protected]> --- docs/schema_design.adoc | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 9b05991..0c1e0a6 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -72,13 +72,14 @@ column types include: * 16-bit signed integer * 32-bit signed integer * 64-bit signed integer +* date (32-bit days since the Unix epoch) * unixtime_micros (64-bit microseconds since the Unix epoch) * single-precision (32-bit) IEEE-754 floating-point number * double-precision (64-bit) IEEE-754 floating-point number * decimal (see <<decimal>> for details) +* varchar (see <<varchar>> for details) * UTF-8 encoded string (up to 64KB uncompressed) * binary (up to 64KB uncompressed) -* VARCHAR type with configurable maximum length (up to 64KB uncompressed) Kudu takes advantage of strongly-typed columns and a columnar on-disk storage format to provide efficient encoding and serialization. To make the most of @@ -90,9 +91,9 @@ be specified on a per-column basis. [[no_version_column]] [IMPORTANT] .No Version or Timestamp Column -Unlike HBase, Kudu does not provide a version or timestamp column to track changes -to a row. If version or timestamp information is needed, the schema should include -an explicit version or timestamp column. +Kudu does not provide a version or timestamp column to track changes to a row. +If version or timestamp information is needed, the schema should include an +explicit version or timestamp column. [[decimal]] === Decimal Type @@ -136,6 +137,24 @@ Before encoding and compression: NOTE: The precision and scale of `decimal` columns cannot be changed by altering the table. +[[varchar]] +=== Varchar Type + +The `varchar` type is a UTF-8 encoded string (up to 64KB uncompressed) with a +fixed maximum character length. This type is especially useful when migrating +from or integrating with legacy systems that support the `varchar` type. +If a maximum character length is not required the `string` type should be +used instead. + +The `varchar` type is a parameterized type that takes a length attribute. + +*Length* represents the maximum number of UTF-8 characters allowed. Values +with characters greater than the limit will be truncated. This value must +be between 1 and 65535 and has no default. Note that some other systems +may represent the length limit in bytes instead of characters. That means +that Kudu may be able to represent longer values in the case of multi-byte +UTF-8 characters. + [[encoding]] === Column Encoding @@ -145,12 +164,12 @@ of the column. .Encoding Types [options="header"] |=== -| Column Type | Encoding | Default -| int8, int16, int32 | plain, bitshuffle, run length | bitshuffle -| int64, unixtime_micros | plain, bitshuffle, run length | bitshuffle -| float, double, decimal | plain, bitshuffle | bitshuffle -| bool | plain, run length | run length -| string, binary, varchar | plain, prefix, dictionary | dictionary +| Column Type | Encoding | Default +| int8, int16, int32, int64 | plain, bitshuffle, run length | bitshuffle +| date, unixtime_micros | plain, bitshuffle, run length | bitshuffle +| float, double, decimal | plain, bitshuffle | bitshuffle +| bool | plain, run length | run length +| string, varchar, binary | plain, prefix, dictionary | dictionary |=== [[plain]]
