This is an automated email from the ASF dual-hosted git repository.
alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push:
new cf550d6d7 KUDU-2671: Update upstream docs
cf550d6d7 is described below
commit cf550d6d7cdd61f6c65f9ef75a1706cb91839876
Author: Mahesh Reddy <[email protected]>
AuthorDate: Tue Mar 5 15:20:33 2024 -0800
KUDU-2671: Update upstream docs
This patch updates the upstream docs to include range specific
hash schemas within the partitioning section. An example
with the proper sql syntax is also included in the kudu impala
integration doc.
Change-Id: I8da554851a124d1d357be65d8bcc2c6c37875dcc
Reviewed-on: http://gerrit.cloudera.org:8080/21108
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
---
docs/kudu_impala_integration.adoc | 42 +++++++++++++++++++++++++++++++++++++++
docs/schema_design.adoc | 16 +++++++++++++++
2 files changed, 58 insertions(+)
diff --git a/docs/kudu_impala_integration.adoc
b/docs/kudu_impala_integration.adoc
index 0def0477c..de01c3d59 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -485,6 +485,48 @@ The example creates 16 partitions. You could also use
`HASH (id, sku) PARTITIONS
However, a scan for `sku` values would almost always impact all 16 partitions,
rather
than possibly being limited to 4.
+.Range-Specific Hash Schemas
+As of 1.17, Kudu supports range-specific hash schemas for tables. It's
possible to
+add ranges with a hash schema independent of the table-wide hash schema. This
can be
+done while creating or altering the table. The number of hash partition levels
must
+be the same across all ranges in a table.
+
+[source, sql]
+----
+CREATE TABLE cust_behavior (
+ id BIGINT,
+ sku STRING,
+ salary STRING,
+ edu_level INT,
+ usergender STRING,
+ `group` STRING,
+ city STRING,
+ postcode STRING,
+ last_purchase_price FLOAT,
+ last_purchase_date BIGINT,
+ category STRING,
+ rating INT,
+ fulfilled_date BIGINT,
+ PRIMARY KEY (id, sku)
+)
+PARTITION BY HASH (id) PARTITIONS 4
+RANGE (sku)
+(
+ PARTITION VALUES < 'g'
+ PARTITION 'g' <= VALUES < 'o'
+ HASH (id) PARTITIONS 6
+ PARTITION 'o' <= VALUES < 'u'
+ HASH (id) PARTITIONS 8
+ PARTITION 'u' <= VALUES
+)
+STORED AS KUDU;
+----
+
+This example uses the range-specific hash schema feature for the middle two
+ranges. The table-wide hash schema has 4 buckets while the hash schemas
+for the middle two ranges have 6 and 8 buckets respectively. This can be done
+in cases where we expect a higher workload in such ranges.
+
.Non-Covering Range Partitions
Kudu 1.0 and higher supports the use of non-covering range partitions,
which address scenarios like the following:
diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc
index 95d4d251c..906682b86 100644
--- a/docs/schema_design.adoc
+++ b/docs/schema_design.adoc
@@ -435,6 +435,22 @@ NOTE: see the <<hash-range-partitioning-example>> and the
<<hash-hash-partitioning-example>> for further discussion of multilevel
partitioning.
+[[flexible-partitioning]]
+=== Flexible Partitioning
+
+As of 1.17, Kudu supports range-specific hash schema for tables. It's now
+possible to add ranges with their own unique hash schema independent of the
+table-wide hash schema. This can be done while creating or altering the table.
+This feature helps mitigate potential hotspotting as more buckets can be
+added for a hash schema of a range that expects more workload.
+
+[[same-number-of-hash-levels]]
+[IMPORTANT]
+.Same Number of Hash Levels
+The number of hash partition levels must be the same across for all the ranges
+in a table. See <<multilevel-partitioning>> for more details on hash partition
+levels.
+
[[partition-pruning]]
=== Partition Pruning