Add docs for non-covering range partitioning Change-Id: I3b0fd7500c5399db9dcad617ae67fea247307353 Reviewed-on: http://gerrit.cloudera.org:8080/3796 Reviewed-by: Dan Burkert <[email protected]> Tested-by: Misty Stanley-Jones <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/274dfb05 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/274dfb05 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/274dfb05 Branch: refs/heads/master Commit: 274dfb05944edef1fac3be78ea0c699b048c5b31 Parents: 0fb4409 Author: Misty Stanley-Jones <[email protected]> Authored: Wed Jul 27 12:19:52 2016 -0700 Committer: Misty Stanley-Jones <[email protected]> Committed: Mon Aug 15 22:10:56 2016 +0000 ---------------------------------------------------------------------- docs/kudu_impala_integration.adoc | 55 ++++++++++ docs/schema_design.adoc | 193 +++++++++++++++++++-------------- 2 files changed, 168 insertions(+), 80 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/274dfb05/docs/kudu_impala_integration.adoc ---------------------------------------------------------------------- diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc index 793b402..fb29bd8 100755 --- a/docs/kudu_impala_integration.adoc +++ b/docs/kudu_impala_integration.adoc @@ -795,6 +795,61 @@ The example creates 16 buckets. You could also use `HASH (id, sku) INTO 16 BUCKE However, a scan for `sku` values would almost always impact all 16 buckets, rather than possibly being limited to 4. +// Not ready for 0.10 but don't want to lose the work +//// + +.Non-Covering Range Partitions +Kudu TODO:VERSION and higher supports the use of non-covering range partitions, +which address scenarios like the following: + +- Without non-covering range partitions, in the case of time-series data or other + schemas which need to account for constantly-increasing primary keys, tablets + serving old data will be relatively fixed in size, while tablets receiving new + data will grow without bounds. + +- In cases where you want to partition data based on its category, such as sales + region or product type, without non-covering range partitions you must know all + of the partitions ahead of time or manually recreate your table if partitions + need to be added or removed, such as the introduction or elimination of a product + type. + +Non-covering range partitions have some caveats. Be sure to read the +link:/docs/schema_design.html [Schema Design guide]. + +This example creates a tablet per year (5 tablets total), for storing log data. It uses `RANGE BOUND` +to ensure that the table only accepts data from 2011 to 2017. Keys outside of these +ranges will be rejected. + +[source,sql] +---- +CREATE TABLE sales_by_year (year INT32, sale_id INT32, amount INT32) +PRIMARY KEY (sale_id, year) +DISTRIBUTE BY RANGE (year) + RANGE BOUND ((2011), (2016)) + SPLIT ROWS ((2012), (2013), (2014), (2015), (2016)); +---- + +When records start coming in for 2017, they will be rejected. At that point, the `2017` +range should be added. An `alter table add range partition` or `alter table drop range +partition` operation allows you to add or drop a range partition. + +The next example creates a table per sales region. Data for regions other than `North +America`, `Europe`, or `Asia` will be rejected. This example does not use explicit +split rows, but the range bounds provide implicit split rows, so three tablets would +be created. If a new range is added, a new tablet is created. + +[source,sql] +---- +CREATE TABLE sales_by_region (region STRING, sale_id INT32, amount INT32) +PRIMARY KEY (sale_id, region) +DISTRIBUTE BY RANGE (region) + RANGE BOUND (("North America"), ("North America\0")), + RANGE BOUND (("Europe"), ("Europe\0")), + RANGE BOUND (("Asia"), ("Asia\0")); +---- + +//// + [[partitioning_rules_of_thumb]] ==== Partitioning Rules of Thumb http://git-wip-us.apache.org/repos/asf/kudu/blob/274dfb05/docs/schema_design.adoc ---------------------------------------------------------------------- diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 0851e44..1d3078f 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -41,89 +41,21 @@ be a new concept for those familiar with traditional relational databases. The next sections discuss <<alter-schema,altering the schema>> of an existing table, and <<known-limitations,known limitations>> with regard to schema design. -[[column-design]] -== Column Design - -A Kudu Table consists of one or more columns, each with a predefined type. -Columns that are not part of the primary key may optionally be nullable. -Supported column types include: - -* boolean -* 8-bit signed integer -* 16-bit signed integer -* 32-bit signed integer -* 64-bit signed integer -* timestamp -* single-precision (32-bit) IEEE-754 floating-point number -* double-precision (64-bit) IEEE-754 floating-point number -* UTF-8 encoded string -* binary - -Kudu takes advantage of strongly-typed columns and a columnar on-disk storage -format to provide efficient encoding and serialization. To make the most of these -features, columns must be specified as the appropriate type, rather than -simulating a 'schemaless' table using string or binary columns for data which -may otherwise be structured. In addition to encoding, Kudu optionally allows -compression to be specified on a per-column basis. - -[[encoding]] -=== Column Encoding - -Each column in a Kudu table can be created with an encoding, based on the type -of the column. Columns use plain encoding by default. - -.Encoding Types -[options="header"] -|=== -| Column Type | Encoding -| integer, timestamp | plain, bitshuffle, run length -| float | plain, bitshuffle -| bool | plain, dictionary, run length -| string, binary | plain, prefix, dictionary -|=== - -[[plain]] -Plain Encoding:: Data is stored in its natural format. For example, `int32` values -are stored as fixed-size 32-bit little-endian integers. - -[[bitshuffle]] -Bitshuffle Encoding:: Data is rearranged to store the most significant bit of -every value, followed by the second most significant bit of every value, and so -on. Finally, the result is LZ4 compressed. Bitshuffle encoding is a good choice for -columns that have many repeated values, or values that change by small amounts -when sorted by primary key. The -https://github.com/kiyo-masui/bitshuffle[bitshuffle] project has a good -overview of performance and use cases. - -[[run-length]] -Run Length Encoding:: _Runs_ (consecutive repeated values) are compressed in a -column by storing only the value and the count. Run length encoding is effective -for columns with many consecutive repeated values when sorted by primary key. - -[[dictionary]] -Dictionary Encoding:: A dictionary of unique values is built, and each column value -is encoded as its corresponding index in the dictionary. Dictionary encoding -is effective for columns with low cardinality. If the column values of a given row set -are unable to be compressed because the number of unique values is too high, Kudu will -transparently fall back to plain encoding for that row set. This is evaluated during -flush. +== The Perfect Schema -[[prefix]] -Prefix Encoding:: Common prefixes are compressed in consecutive column values. Prefix -encoding can be effective for values that share common prefixes, or the first -column of the primary key, since rows are sorted by primary key within tablets. +The perfect schema would accomplish the following: -[[compression]] -=== Column Compression +- Data would be distributed in such a way that reads and writes are spread evenly + across tablet servers. This is impacted by the partition schema. +- Tablets would grow at an even, predictable rate and load across tablets would remain + steady over time. This is most impacted by the partition schema. +- Scans would read the minimum amount of data necessary to fulfill a query. This + is impacted mostly by primary key design, but partition design also plays a + role via partition pruning. -Kudu allows per-column compression using LZ4, `snappy`, or `zlib` compression -codecs. By default, columns are stored uncompressed. Consider using compression -if reducing storage space is more important than raw scan performance. - -Every data set will compress differently, but in general LZ4 has the least effect on -performance, while `zlib` will compress to the smallest data sizes. -Bitshuffle-encoded columns are inherently compressed using LZ4, so it is not -typically beneficial to apply additional compression on top of this encoding. +The perfect schema depends on the characteristics of your data, what you need to do +with it, and the topology of your cluster. Schema design is the single most important +thing within your control to maximize the performance of your Kudu cluster. [[primary-keys]] == Primary Keys @@ -209,6 +141,23 @@ should only include the `last_name` column. In that case, Kudu would guarantee t customers with the same last name would fall into the same tablet, regardless of the provided split rows. +[[range-partition-management]] +==== Range Partition Management + +Kudu 0.10 introduces the ability to specify bounded range partitions during +table creation, and the ability add and drop range partitions on the fly. This is +a good strategy for data which is always increasing, such as timestamps, or for +categorical data, such as geographic regions. + +For example, during table creation, bounded range partitions can be +added for the regions 'US-EAST', 'US-WEST', and 'EUROPE'. If you attempt to insert a +row with a region that does not match an existing range partition, the insertion will +fail. Later, when a new region is needed it can be efficiently added as part of an +`ALTER TABLE` operation. This feature is particularly useful for timeseries data, +since it allows new range partitions for the current period to be added as +needed, and old partitions covering historical periods to be dropped if +necessary. + [[hash-bucketing]] === Hash Bucketing @@ -272,6 +221,90 @@ You can alter a table's schema in the following ways: You cannot modify the partition schema after table creation. +[[column-design]] +== Column Design + +A Kudu Table consists of one or more columns, each with a predefined type. +Columns that are not part of the primary key may optionally be nullable. +Supported column types include: + +* boolean +* 8-bit signed integer +* 16-bit signed integer +* 32-bit signed integer +* 64-bit signed integer +* timestamp +* single-precision (32-bit) IEEE-754 floating-point number +* double-precision (64-bit) IEEE-754 floating-point number +* UTF-8 encoded string +* binary + +Kudu takes advantage of strongly-typed columns and a columnar on-disk storage +format to provide efficient encoding and serialization. To make the most of these +features, columns must be specified as the appropriate type, rather than +simulating a 'schemaless' table using string or binary columns for data which +may otherwise be structured. In addition to encoding, Kudu optionally allows +compression to be specified on a per-column basis. + +[[encoding]] +=== Column Encoding + +Each column in a Kudu table can be created with an encoding, based on the type +of the column. Columns use plain encoding by default. + +.Encoding Types +[options="header"] +|=== +| Column Type | Encoding +| integer, timestamp | plain, bitshuffle, run length +| float | plain, bitshuffle +| bool | plain, dictionary, run length +| string, binary | plain, prefix, dictionary +|=== + +[[plain]] +Plain Encoding:: Data is stored in its natural format. For example, `int32` values +are stored as fixed-size 32-bit little-endian integers. + +[[bitshuffle]] +Bitshuffle Encoding:: Data is rearranged to store the most significant bit of +every value, followed by the second most significant bit of every value, and so +on. Finally, the result is LZ4 compressed. Bitshuffle encoding is a good choice for +columns that have many repeated values, or values that change by small amounts +when sorted by primary key. The +https://github.com/kiyo-masui/bitshuffle[bitshuffle] project has a good +overview of performance and use cases. + +[[run-length]] +Run Length Encoding:: _Runs_ (consecutive repeated values) are compressed in a +column by storing only the value and the count. Run length encoding is effective +for columns with many consecutive repeated values when sorted by primary key. + +[[dictionary]] +Dictionary Encoding:: A dictionary of unique values is built, and each column value +is encoded as its corresponding index in the dictionary. Dictionary encoding +is effective for columns with low cardinality. If the column values of a given row set +are unable to be compressed because the number of unique values is too high, Kudu will +transparently fall back to plain encoding for that row set. This is evaluated during +flush. + +[[prefix]] +Prefix Encoding:: Common prefixes are compressed in consecutive column values. Prefix +encoding can be effective for values that share common prefixes, or the first +column of the primary key, since rows are sorted by primary key within tablets. + +[[compression]] +=== Column Compression + +Kudu allows per-column compression using LZ4, `snappy`, or `zlib` compression +codecs. By default, columns are stored uncompressed. Consider using compression +if reducing storage space is more important than raw scan performance. + +Every data set will compress differently, but in general LZ4 has the least effect on +performance, while `zlib` will compress to the smallest data sizes. +Bitshuffle-encoded columns are inherently compressed using LZ4, so it is not +typically beneficial to apply additional compression on top of this encoding. + [[known-limitations]] == Known Limitations
