jon-wei opened a new issue #10462:
URL: https://github.com/apache/druid/issues/10462


   Apache Druid 0.20.0 contains around 140 new features, bug fixes, performance 
enhancements, documentation improvements, and additional test coverage from 36 
contributors. Refer to the [complete list of 
changes](https://github.com/apache/druid/compare/0.19.0...0.20.0) and 
[everything tagged to the 
milestone](https://github.com/apache/druid/milestone/40) for further details.
   
   # <a name="20-new-features" href="#20-new-features">#</a> New Features
   
   ## <a name="20-hash-segment-pruning" href="#20-hash-segment-pruning">#</a> 
Query segment pruning with hash partitioning
   
   Druid now supports query-time segment pruning (excluding certain segments as 
read candidates for a query) for hash partitioned segments. This optimization 
applies when all of the `partitionDimensions` specified in the hash partition 
spec during ingestion time are present in the filter set of a query, and the 
filters in the query filter on discrete values of the `partitionDimensions` 
(e.g., selector filters). Segment pruning with hash partitioning is not 
supported with non-discrete filters such as bound filters.
   
   For existing users with existing segments, you will need to reingest those 
segments to take advantage of this new feature, as the segment pruning requires 
a `partitionFunction` to be stored together with the segments, which does not 
exist in segments created by older versions of Druid. It is not necessary to 
specify the `partitionFunction` explicitly, as the default is the same 
partition function that was used in prior versions of Druid.
   
   Note that segments created with a default `partitionDimensions` value 
(partition by all dimensions + the time column) cannot be pruned in this 
manner, the segments need to be created with an explicit `partitionDimensions`.
   
   https://github.com/apache/druid/pull/9810
   https://github.com/apache/druid/pull/10288
   
   ## <a name="20-cluster-wide-default-query-context" 
href="#20-cluster-wide-default-query-context">#</a> Cluster-wide default query 
context settings
   
   It is now possible to set cluster-wide default query context properties by 
adding a configuration of the form `druid.query.override.default.context.*`, 
with `*` replaced by the property name.
   
   https://github.com/apache/druid/pull/10208
   
   ## <a name="20-improved-retention-rules-ui" 
href="#20-improved-retention-rules-ui">#</a> Improved retention rules UI
   
   The retention rules UI in the web console has been improved. It now provides 
suggestions and basic validation in the period dropdown, shows the cluster 
default rules, and makes editing the default rules more accessible.
   
   https://github.com/apache/druid/pull/10226
   
   ## <a name="20-groupby-offset" href="#20-groupby-offset">#</a> `offset` 
parameter for GroupBy and Scan queries
   
   It is now possible set an `offset` parameter for GroupBy and Scan queries, 
which tells Druid to skip a number of rows when returning results. Please see 
https://druid.apache.org/docs/latest/querying/limitspec.html and 
https://druid.apache.org/docs/latest/querying/scan-query.html for details.
   
   https://github.com/apache/druid/pull/10235
   https://github.com/apache/druid/pull/10233
   
   ## <a name="20-sql-offset" href="#20-sql-offset">#</a> `OFFSET` clause for 
SQL queries
   
   Druid SQL queries now support an `OFFSET` clause. Please see 
https://druid.apache.org/docs/latest/querying/sql.html#offset for details.
   
   https://github.com/apache/druid/pull/10279 
   
   ## <a name="20-sql-contains" href="#20-sql-contains">#</a> Substring search 
operators
   
   Druid has added new substring search operators in its expression language 
and for SQL queries.
   
   Please see documentation for `CONTAINS_STRING` and `ICONTAINS_STRING` string 
functions for Druid SQL 
(https://druid.apache.org/docs/latest/querying/sql.html#string-functions) and 
documentation for `contains_string` and `icontains_string` for the Druid 
expression language 
(https://druid.apache.org/docs/latest/misc/math-expr.html#string-functions).
   
   https://github.com/apache/druid/pull/10350
   
   ## <a name="20-sql-union-all" href="#20-sql-union-all">#</a> UNION ALL 
operator for SQL queries
   
   Druid SQL queries now support the `UNION ALL` operator, which fuses the 
results of multiple queries together. Please see 
https://druid.apache.org/docs/latest/querying/sql.html#union-all for details on 
what query shapes are supported by this operator.
   
   https://github.com/apache/druid/pull/10324
   
   ## <a name="20-vectorized-min-max" href="#20-vectorized-min-max">#</a> 
Vectorization support for long, double, float min & max aggregators
   
   Vectorization support has been added for several aggregation types: numeric 
min/max aggregators, variance aggregators, ANY aggregators, and aggregators 
from the `druid-histogram` extension.
   
   https://github.com/apache/druid/pull/10260 - numeric min/max
   https://github.com/apache/druid/pull/10304 - histogram
   https://github.com/apache/druid/pull/10338 - ANY
   https://github.com/apache/druid/pull/10390 - variance
   
   ## <a name="20-vectorized-virtual-columns" 
href="#20-vectorized-virtual-columns">#</a> Vectorization support for 
expression virtual columns
   
   Expression virtual columns now have vectorization support (depending on the 
expressions being used), which an results in a 3-5x performance improvement in 
some cases. 
   
   Please see 
https://druid.apache.org/docs/latest/misc/math-expr.html#vectorization-support 
for details on the specific expressions that support vectorization, and 
https://druid.apache.org/docs/latest/querying/query-context.html#vectorization-parameters
 for more information on query context parameters that control vectorization.
   
   https://github.com/apache/druid/pull/10388
   https://github.com/apache/druid/pull/10401
   https://github.com/apache/druid/pull/10432
   
   ## <a name="20-split-hint-max-files" href="#20-split-hint-max-files">#</a> 
Subtask file count limits for parallel batch ingestion
   
   The size-based `splitHintSpec` now supports a new `maxNumFiles` parameter, 
which limits how many files can be assigned to individual subtasks in parallel 
batch ingestion. 
   
   The segment-based `splitHintSpec` used for reingesting data from existing 
Druid segments also has a new `maxNumSegments` parameter which functions 
similarly.
   
   Please see 
https://druid.apache.org/docs/latest/ingestion/native-batch.html#split-hint-spec
 for more details.
   
   https://github.com/apache/druid/pull/10243
   
   ## <a name="20-redis-extension" href="#20-redis-extension">#</a> Redis cache 
extension enhancements
   
   The Redis cache extension now supports Redis Cluster, selecting which 
database is used, connecting to password-protected servers, and period-style 
configurations for the `expiration` and `timeout` properties.
   
   https://github.com/apache/druid/pull/10240
   
   ## <a name="20-auto-compaction-partition" 
href="#20-auto-compaction-partition">#</a> Support for all partitioning schemes 
for auto-compaction
   
   A partitioning spec can now be defined for auto-compaction, allowing users 
to repartition their data at compaction time. Please see the documentation for 
the new `partitionsSpec` property in the compaction `tuningConfig` for more 
details: 
https://druid.apache.org/docs/latest/configuration/index.html#compaction-tuningconfig
   
   
https://druid.apache.org/docs/latest/configuration/index.html#compaction-tuningconfig
   
   https://github.com/apache/druid/pull/10307
   
   ## <a name="20-combining-input-source" 
href="#20-combining-input-source">#</a> Combining InputSource
   
   A new combining InputSource has been added, allowing the user to combine 
multiple input sources during ingestion. Please see 
https://druid.apache.org/docs/latest/ingestion/native-batch.html#combining-input-source
 for more details.
   
   https://github.com/apache/druid/pull/10387
   
   ## <a name="20-autocompaction-status-api" 
href="#20-autocompaction-status-api">#</a> Auto-compaction status API
   
   A new coordinator API which shows the status of auto-compaction for a 
datasource has been added. The new API shows whether auto-compaction is enabled 
for a datasource, and a summary of how far compaction has progressed. 
   
   The web console has also been updated to show this information:
   
   
https://user-images.githubusercontent.com/177816/94326243-9d07e780-ff57-11ea-9f80-256fa08580f0.png
   
   TBD: pending docs for this feature, will link when available
   
   https://github.com/apache/druid/pull/10371
   https://github.com/apache/druid/pull/10438
   
   ## <a name="20-auto-num-shards" href="#20-auto-num-shards">#</a> 
Automatically determine numShards for parallel ingestion hash partitioning
   
   When hash partitioning is used in parallel batch ingestion, it is no longer 
necessary to specify `numShards` in the partition spec. Druid can now 
automatically determine a number of shards by scanning the data in a new 
ingestion phase that determines the cardinalities of the partitioning key.
   
   https://github.com/apache/druid/pull/10419
   
   ## <a name="20-task-slot-metrics" href="#20-task-slot-metrics">#</a> Task 
slot usage metrics
   
   New task slot usage metrics have been added. Please see the entries for the 
`taskSlot` metrics at 
https://druid.apache.org/docs/latest/operations/metrics.html#indexing-service 
for more details.
   
   https://github.com/apache/druid/pull/10379
   
   ## <a name="20-disable-server-version" 
href="#20-disable-server-version">#</a> Disable sending server version in 
response headers
   
   It is now possible to disable sending of server version information in 
Druid's response headers.
   
   This is controlled by a new property `druid.server.http.sendServerVersion`, 
which defaults to `true`.
   
   https://github.com/apache/druid/pull/9832
   
   # <a name="20-bugs" href="#20-bugs">#</a> Bug fixes
   
   ## <a name="20-auto-num-shards" href="#20-auto-num-shards">#</a> Fix query 
correctness issue when historical has no segment timeline
   
   Druid 0.20.0 fixes a query correctness issue when a broker issues a query 
expecting a historical to have certain segments for a datasource, but the 
historical when queried does not actually have any segments for that datasource 
(e.g., they were all unloaded before the historical processed the query). Prior 
to 0.20.0, the query would return successfully but without the results from the 
segments that were missing in the manner described previously. In 0.20.0, 
queries will now fail in such situations.
   
   https://github.com/apache/druid/pull/10199
   
   ## <a name="20-result-caching" href="#20-result-caching">#</a> Fix issue 
preventing result-level cache from being populated
   
   Druid 0.20.0 fixes an issue introduced in 0.19.0 
(https://github.com/apache/druid/issues/10337) which can prevent query caches 
from being populated when result-level caching is enabled.
   
   https://github.com/apache/druid/pull/10341
   
   ## <a name="20-variance-comparator" href="#20-variance-comparator">#</a> Fix 
for variance aggregator ordering
   
   The variance aggregator previously used an incorrect comparator that 
compared using an aggregator's internal `count` variable instead of the 
variance.
   
   https://github.com/apache/druid/pull/10340
   
   ## <a name="20-limitspec-cache" href="#20-limitspec-cache">#</a> Fix 
incorrect caching for groupBy queries with limit specs
   
   Druid 0.20.0 fixes an issues with groupBy queries and caching, where the 
limitSpec of the query was not considered in the cache key, leading to 
potentially incorrect results if queries that are identical except for the 
limitSpec are issued.
   
   https://github.com/apache/druid/pull/10093
   
   # <a name="20-upgrading-from-previous" 
href="#20-upgrading-from-previous">#</a> Upgrading to Druid 0.20.0
   
   Please be aware of the following considerations when upgrading from 0.19.0 
to 0.20.0. If you're updating from an earlier version than 0.19.0, please see 
the release notes of the relevant intermediate versions.
   
   ## <a name="20-default-max-size" href="#20-default-max-size">#</a> Default 
`maxSize`
   
   `druid.server.maxSize` will now default to the sum of `maxSize` values 
defined within the `druid.segmentCache.locations`. The user can still provide a 
custom value for `druid.server.maxSize` which will take precedence over the 
default value.
   
   https://github.com/apache/druid/pull/10255
   
   ## <a name="20-id-name-change" href="#20-id-name-change">#</a> Compaction 
and kill task ID changes
   
   Compaction and kill tasks issued by the coordinator will now have their task 
IDs prefixed by `coordinator-issued`, while user-issued kill tasks will be 
prefixed by `api-issued`.
   
   https://github.com/apache/druid/pull/10278
   
   ## <a name="20-new-size-limit-split" href="#20-new-size-limit-split">#</a> 
New size limits for parallel ingestion split hint specs
   
   The size-based and segment-based `splitHintSpec` for parallel batch 
ingestion now apply a default file/segment limit of 1000 per subtask, 
controlled by the `maxNumFiles` and `maxNumSegments` respectively. 
   
   https://github.com/apache/druid/pull/10243
   
   ## <a name="20-new-agg-methods" href="#20-new-agg-methods">#</a> New 
`PostAggregator` and `AggregatorFactory` methods
   
   Users who have developed an extension with custom `PostAggregator` or 
`AggregatorFactory` implementions will need to update their extensions, as 
these two interfaces have new methods defined in 0.20.0. 
   
   `PostAggregator` now has a new method:
   
   ```
     ValueType getType();
   ```
   
   To support type information on `PostAggregator`, `AggregatorFactory` also 
has 2 new methods:
   
   ```
     public abstract ValueType getType();
   
     public abstract ValueType getFinalizedType();
   ```
   ## <a name="20-new-expr-methods" href="#20-new-expr-methods">#</a> New 
`Expr`-related methods
   
   Users who have developed an extension with custom `Expr` implementions will 
need to update their extensions, as `Expr` and related interfaces hae changed 
in 0.20.0. Please see the PR below for details:
   
   https://github.com/apache/druid/pull/10401
   
   Please see https://github.com/apache/druid/pull/9638 for more details on the 
interface changes.
   
   ## <a name="20-sequence-time" href="#20-sequence-time">#</a> More accurate 
`query/cpu/time` metric
   
   In 0.20.0, the accuracy of the `query/cpu/time` metric has been improved. 
Previously, it did not account for certain portions of work during query 
processing, described in more detail in the following PR: 
   
   https://github.com/apache/druid/pull/10377
   
   ## <a name="20-audit-log-cols" href="#20-audit-log-cols">#</a> New audit log 
service metric columns
   
   If you are using audit logging, please be aware that new columns have been 
added to the audit log service metric (`comment`, `remote_address`, and 
`created_date`). An optional `payload` column has also been added, which can be 
enabled by setting `druid.audit.manager.includePayloadAsDimensionInMetric` to 
`true`.
   
   https://github.com/apache/druid/pull/10373
   
   ## <a name="20-request-log-sql-context" 
href="#20-request-log-sql-context">#</a> `sqlQueryContext` in request logs
   
   If you are using query request logging, the request log events will now 
include the `sqlQueryContext` for SQL queries.
   
   https://github.com/apache/druid/pull/10368
   
   ## <a name="20-last-compaction-state" href="#20-last-compaction-state">#</a> 
Additional per-segment state in metadata store
   
   Hash-partitioned segments created by Druid 0.20.0 will now have additional 
`partitionFunction` data in the metadata store.
   
   Additionally, compaction tasks will now store additional per-segment 
information in the metadata store, used for tracking compaction history.
   
   https://github.com/apache/druid/pull/10288
   https://github.com/apache/druid/pull/10413
   
   # <a name="20-credits" href="#20-credits">#</a> Credits
   
   Thanks to everyone who contributed to this release!
   
   @a2l007
   @abhishekagarwal87
   @abhishekrb19
   @ArvinZheng
   @belugabehr
   @capistrant
   @ccaominh
   @clintropolis
   @code-crusher
   @dylwylie
   @fermelone
   @FrankChen021
   @gianm
   @himanshug
   @jihoonson
   @jon-wei
   @joykent99
   @kroeders
   @lightghli
   @mans2singh
   @maytasm
   @medb
   @mghosh4
   @nishantmonu51
   @pan3793
   @richardstartin
   @sthetland
   @suneet-s
   @tarunparackal
   @tdt17
   @tourvi
   @vogievetsky
   @wjhypo
   @xiangqiao123
   @xvrl
   
   
   ---
   
   TBD: are there any breaking changes from
   https://github.com/apache/druid/pull/10203
   https://github.com/apache/druid/pull/9810
   https://github.com/apache/druid/pull/10307


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to