new range partitioning features blog post Change-Id: I53504d849c2aca9ff613b11e67d1533536283931 Reviewed-on: http://gerrit.cloudera.org:8080/4012 Reviewed-by: Todd Lipcon <t...@apache.org> Tested-by: Todd Lipcon <t...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/bb0fb419 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/bb0fb419 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/bb0fb419 Branch: refs/heads/gh-pages Commit: bb0fb4190859f174da02d61e7059662bb856f089 Parents: d4a5115 Author: Dan Burkert <d...@cloudera.com> Authored: Tue Aug 16 19:23:07 2016 -0700 Committer: Todd Lipcon <t...@apache.org> Committed: Tue Aug 23 05:57:32 2016 +0000 ---------------------------------------------------------------------- ...016-08-23-new-range-partitioning-features.md | 99 +++++++++++++++++++ .../range-and-hash-partitioning.png | Bin 0 -> 85316 bytes .../range-partitioning-on-time.png | Bin 0 -> 58003 bytes 3 files changed, 99 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/bb0fb419/_posts/2016-08-23-new-range-partitioning-features.md ---------------------------------------------------------------------- diff --git a/_posts/2016-08-23-new-range-partitioning-features.md b/_posts/2016-08-23-new-range-partitioning-features.md new file mode 100644 index 0000000..fe382b9 --- /dev/null +++ b/_posts/2016-08-23-new-range-partitioning-features.md @@ -0,0 +1,99 @@ +--- +layout: post +title: "New Range Partitioning Features in Kudu 0.10" +author: Dan Burkert +--- + +Kudu 0.10 is shipping with a few important new features for range partitioning. +These features are designed to make Kudu easier to scale for certain workloads, +like time series. This post will introduce these features, and discuss how to use +them to effectively design tables for scalability and performance. + +<!--more--> + +Since Kudu's initial release, tables have had the constraint that once created, +the set of partitions is static. This forces users to plan ahead and create +enough partitions for the expected size of the table, because once the table is +created no further partitions can be added. When using hash partitioning, +creating more partitions is as straightforward as specifying more buckets. For +range partitioning, however, knowing where to put the extra partitions ahead of +time can be difficult or impossible. + +The common solution to this problem in other distributed databases is to allow +range partitions to split into smaller child range partitions. Unfortunately, +range splitting typically has a large performance impact on running tables, +since child partitions need to eventually be recompacted and rebalanced to a +remote server. Range splitting is particularly thorny with Kudu, because rows +are stored in tablets in primary key sorted order, which does not necessarily +match the range partitioning order. If the range partition key is different than +the primary key, then splitting requires inspecting and shuffling each +individual row, instead of splitting the tablet in half. + +## Adding and Dropping Range Partitions + +As an alternative to range partition splitting, Kudu now allows range partitions +to be added and dropped on the fly, without locking the table or otherwise +affecting concurrent operations on other partitions. This solution is not +strictly as powerful as full range partition splitting, but it strikes a good +balance between flexibility, performance, and operational overhead. +Additionally, this feature does not preclude range splitting in the future if +there is a push to implement it. To support adding and dropping range +partitions, Kudu had to remove an even more fundamental restriction when using +range partitions. + +Previously, range partitions could only be created by specifying split points. +Split points divide an implicit partition covering the entire range into +contiguous and disjoint partitions. When using split points, the first and last +partitions are always unbounded below and above, respectively. A consequence of +the final partition being unbounded is that datasets which are range-partitioned +on a column that increases in value over time will eventually have far more rows +in the last partition than in any other. Unbalanced partitions are commonly +referred to as hotspots, and until Kudu 0.10 they have been difficult to avoid +when storing time series data in Kudu. + +{: .img-responsive} + +The figure above shows the tablets created by two different attempts to +partition a table by range on a timestamp column. The first, above in blue, uses +split points. The second, below in green, uses bounded range partitions +specified during table creation. With bounded range partitions, there is no +longer a guarantee that every possible row has a corresponding range partition. +As a result, Kudu will now reject writes which fall in a 'non-covered' range. + +Now that tables are no longer required to have range partitions covering all +possible rows, Kudu can support adding range partitions to cover the otherwise +unoccupied space. Dropping a range partition will result in unoccupied space +where the range partition was previously. In the example above, we may want to +add a range partition covering 2017 at the end of the year, so that we can +continue collecting data in the future. By lazily adding range partitions we +avoid hotspotting, avoid the need to specify range partitions up front for time +periods far in the future, and avoid the downsides of splitting. Additionally, +historical data which is no longer useful can be efficiently deleted by dropping +the entire range partition. + +## What About Hash Partitioning? + +Since Kudu's hash partitioning feature originally shipped in version 0.6, it has +been possible to create tables which combine hash partitioning with range +partitioning. The new range partitioning features continue to work seamlessly +when combined with hash partitioning. Just as before, the number of tablets +which comprise a table will be the product of the number of range partitions and +the number of hash partition buckets. Adding or dropping a range partition will +result in the creation or deletion of one tablet per hash bucket. + +{: .img-responsive} + +The diagram above shows a time series table range-partitioned on the timestamp +and hash-partitioned with two buckets. The hash partitioning could be on the +timestamp column, or it could be on any other column or columns in the primary +key. In this example only two years of historical data is needed, so at the end +of 2016 a new range partition is added for 2017 and the historical 2014 range +partition is dropped. This causes two new tablets to be created for 2017, and +the two existing tablets for 2014 to be deleted. + +## Getting Started + +Beginning with the Kudu 0.10 release, users can add and drop range partitions +through the Java and C++ client APIs. Range partitions on existing tables can be +dropped and replacements added, but it requires the servers and all clients to +be updated to 0.10. http://git-wip-us.apache.org/repos/asf/kudu/blob/bb0fb419/img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png ---------------------------------------------------------------------- diff --git a/img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png b/img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png new file mode 100644 index 0000000..1d61ae4 Binary files /dev/null and b/img/2016-08-23-new-range-partitioning-features/range-and-hash-partitioning.png differ http://git-wip-us.apache.org/repos/asf/kudu/blob/bb0fb419/img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png ---------------------------------------------------------------------- diff --git a/img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png b/img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png new file mode 100644 index 0000000..f8894a1 Binary files /dev/null and b/img/2016-08-23-new-range-partitioning-features/range-partitioning-on-time.png differ