This is an automated email from the ASF dual-hosted git repository. abukor pushed a commit to branch branch-1.13.x in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 931a4a5f053528a985483677e84fb37d54deeeb4 Author: Attila Bukor <[email protected]> AuthorDate: Thu Sep 3 14:15:10 2020 +0200 Add release notes for 1.13.0 Change-Id: Iac2a0ae740c3dabc5e7d8b7ef53312924c6e3532 Reviewed-on: http://gerrit.cloudera.org:8080/16410 Tested-by: Kudu Jenkins Reviewed-by: Greg Solovyev <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Grant Henke <[email protected]> --- docs/release_notes.adoc | 131 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 126 insertions(+), 5 deletions(-) diff --git a/docs/release_notes.adoc b/docs/release_notes.adoc index f23e070..044d389 100644 --- a/docs/release_notes.adoc +++ b/docs/release_notes.adoc @@ -31,27 +31,134 @@ [[rn_1.13.0_upgrade_notes]] == Upgrade Notes - -[[rn_1.13.0_obsoletions]] -== Obsoletions - +* The Sentry integration has been removed and the Ranger integration should now + be used in its place for fine-grained authorization. [[rn_1.13.0_deprecations]] == Deprecations -Support for Python 2.x and Python 3.4 and earlier is deprecated and may be removed in the next minor release. +* Support for Python 2.x and Python 3.4 and earlier is deprecated and may be + removed in the next minor release. +* The `kudu-mapreduce` integration has been deprecated and may be removed in the + next minor release. Similar functionality and capabilities now exist via the + Apache Spark, Apache Hive, Apache Impala, and Apache NiFi integrations. [[rn_1.13.0_new_features]] == New features +* Added table ownership support. All newly created tables are automatically + owned by the user creating them. It is also possible to change the owner by + altering the table. You can also assign privileges to table owners via Apache + Ranger (see link:https://issues.apache.org/jira/browse/KUDU-3090[KUDU-3090]). +* An experimental feature is added to Kudu that allows it to automatically + rebalance tablet replicas among tablet servers. The background task can be + enabled by setting the `--auto_rebalancing_enabled` flag on the Kudu masters. + Before starting auto-rebalancing on an existing cluster, the CLI rebalancer + tool should be run first (see + link:https://issues.apache.org/jira/browse/KUDU-2780[KUDU-2780]). +* Bloom filter column predicate pushdown has been added to allow optimized + execution of filters which match on a set of column values with a + false-positive rate. Support for Impala queries utilizing Bloom filter + predicate is available yielding performance improvements of 19% to 30% in TPC-H + benchmarks and around 41% improvement for distributed joins across large + tables. Support for Spark is not yet available. (see + link:https://issues.apache.org/jira/browse/KUDU-2483[KUDU-2483]). +* AArch64-based (ARM) architectures are now supported including published Docker + images. +* The Java client now supports the columnar row format returned from the server + transparently. Using this format can reduce the server CPU and size of the + request over the network for scans. The columnar format can be enabled via the + setRowDataFormat() method on the KuduScanner. +* An experimental feature that can be enabled by setting the + `--enable_workload_score_for_perf_improvement_ops` prioritizes flushing and + compacting hot tablets. [[rn_1.13.0_improvements]] == Optimizations and improvements +* Hive metastore synchronization now supports Hive 3 and later. +* The Spark KuduContext accumulator metrics now track operation counts per table + instead of cumulatively for all tables. +* The `kudu local_replica delete` CLI tool now accepts multiple tablet + identifiers. Along with the newly added `--ignore_nonexistent` flag, this + helps with scripting scenarios when removing multiple tablet replicas from a + particular Tablet Server. +* Both Master’s and Tablet Server’s web UI now displays the name for a service + thread pool group at the `/threadz` page +* Introduced `queue_overflow_rejections_` metrics for both Masters and Tablet + Servers: number of RPC requests of a particular type dropped due to RPC + service queue overflow. +* Introduced a CoDel-like queue control mechanism for the apply queue. This + helps to avoid accumulating too many write requests and timing them out in + case of seek-bound workloads (e.g., uniform random inserts). The newly + introduced queue control mechanism is disabled by default. To enable it, set + the `--tablet_apply_pool_overload_threshold_ms` Tablet Server’s flag to + appropriate value, e.g. 250 (see + link:https://issues.apache.org/jira/browse/KUDU-1587[KUDU-1587]). +* Operation accumulators in Spark KuduContext are now tracked on a per-table + basis. +* Java client’s error collector can be resized (see + link:https://issues.apache.org/jira/browse/KUDU-1422[KUDU-1422]). +* Calls to the Kudu master server are now drastically reduced when using scan + tokens. Previously deserializing a scan token would result in a GetTableSchema + request and potentially a GetTableLocations request. Now the table schema and + location information is serialized into the scan token itself avoiding the + need for any requests to the master when processing them. +* The default size of Master’s RPC queue is now 100 (it was 50 in earlier + releases). This is to optimize for use cases where a Kudu cluster has many + clients working concurrently. +* Masters now have an option to cache table location responses. This is + targeted for Kudu clusters which have many clients working concurrently. By + default, the caching of table location responses is disabled. To enable table + location caching, set the proper capacity of the table location cache using + Master’s `--table_locations_cache_capacity_mb` flag (setting to 0 disables the + caching). Up to 17% of improvement is observed in GetTableLocations request + rate when enabling the caching. +* Removed lock contention on Raft consensus lock in Tablet Servers while + processing a write request. This helps to avoid RPC queue overflows when + handling concurrent write requests to the same tablet from multiple clients + (see link:https://issues.apache.org/jira/browse/KUDU-2727[KUDU-2727]). +* Master’s performance for handling concurrent GetTableSchema requests has been + improved. End-to-end tests indicated up to 15% improvement in sustained + request rate for high concurrency scenarios. +* Kudu servers now use protobuf Arena objects to perform all RPC + request/response-related memory allocations. This gives a boost for overall + RPC performance, and with further optimization the result request rate + was increased significantly for certain methods. For example, the result request + rate increased up to 25% for Master’s GetTabletLocations() RPC in case of + highly concurrent scenarios (see + link:https://issues.apache.org/jira/browse/KUDU-636[KUDU-636]). +* Tablet Servers now use protobuf Arena for allocating Raft-related runtime + structures. This results in substantial reduction of CPU cycles used and + increases write throughput (see + link:https://issues.apache.org/jira/browse/KUDU-636[KUDU-636]). +* Tablet Servers now use protobuf Arena for allocating EncodedKeys to reduce + allocator contention and improve memory locality (see + link:https://issues.apache.org/jira/browse/KUDU-636[KUDU-636]). +* Bloom filter predicate evaluation for scans can be computationally expensive. + A heuristic has been added that verifies rejection rate of the supplied Bloom + filter predicate below which the Bloom filter predicate is automatically + disabled. This helped reduce regression observed with Bloom filter predicate + in TPC-H benchmark query #9 (see + link:https://issues.apache.org/jira/browse/KUDU-3140[KUDU-3140]). +* Improved scan performance of dictionary and plain-encoded string columns by + avoiding copying them (see + link:https://issues.apache.org/jira/browse/KUDU-2844[KUDU-2844]). +* Improved maintenance manager's heuristics to prioritize larger memstores + (see link:https://issues.apache.org/jira/browse/KUDU-3180[KUDU-3180]). +* Spark client's KuduReadOptions now supports setting a snapshot timestamp for + repeatable reads with READ_AT_SNAPSHOT consistency mode (see + link:https://issues.apache.org/jira/browse/KUDU-3177[KUDU-3177]). [[rn_1.13.0_fixed_issues]] == Fixed Issues +* Kudu scans now honor location assignments when multiple tablet servers are + co-located with the client. +* Fixed a bug that caused IllegalArgumentException to be thrown when trying to + create a predicate for a DATE column in Kudu Java client (see + link:https://issues.apache.org/jira/browse/KUDU-3152[KUDU-3152]). +* Fixed a potential race when multiple RPCs work on the same scanner object. [[rn_1.13.0_wire_compatibility]] == Wire Protocol compatibility @@ -104,6 +211,20 @@ documentation. [[rn_1.13.0_contributors]] == Contributors +Kudu 1.13.0 includes contributions from 22 people, including 9 first-time +contributors: + +* Jim Apple +* Kevin J McCarthy +* Li Zhiming +* Mahesh Reddy +* Romain Rigaux +* RuiChen +* Shuping Zhou +* ningw +* wenjie + + [[resources_and_next_steps]] == Resources
