This is an automated email from the ASF dual-hosted git repository.
brile pushed a commit to branch 29.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/29.0.0 by this push:
new b5577dfff84 [Docs] Druid 29.0.0 release notes (#15805)
b5577dfff84 is described below
commit b5577dfff84ceefe7c6df621f43e0143ba32b663
Author: Katya Macedo <[email protected]>
AuthorDate: Wed Feb 21 11:09:30 2024 -0600
[Docs] Druid 29.0.0 release notes (#15805)
Co-authored-by: 317brian <[email protected]>
---
docs/release-info/assets/image01.png | Bin 0 -> 74828 bytes
docs/release-info/assets/image02.png | Bin 0 -> 69176 bytes
docs/release-info/assets/image03.png | Bin 0 -> 115715 bytes
docs/release-info/release-notes.md | 602 ++++++++++++++++++++++++++++++++++-
docs/release-info/upgrade-notes.md | 63 ++++
website/.spelling | 21 ++
6 files changed, 672 insertions(+), 14 deletions(-)
diff --git a/docs/release-info/assets/image01.png
b/docs/release-info/assets/image01.png
new file mode 100644
index 00000000000..810aba99368
Binary files /dev/null and b/docs/release-info/assets/image01.png differ
diff --git a/docs/release-info/assets/image02.png
b/docs/release-info/assets/image02.png
new file mode 100644
index 00000000000..2fba10dfe73
Binary files /dev/null and b/docs/release-info/assets/image02.png differ
diff --git a/docs/release-info/assets/image03.png
b/docs/release-info/assets/image03.png
new file mode 100644
index 00000000000..c1787d1959b
Binary files /dev/null and b/docs/release-info/assets/image03.png differ
diff --git a/docs/release-info/release-notes.md
b/docs/release-info/release-notes.md
index 768ceef697b..778d878b737 100644
--- a/docs/release-info/release-notes.md
+++ b/docs/release-info/release-notes.md
@@ -24,15 +24,15 @@ title: "Release notes"
<!--Replace {{DRUIDVERSION}} with the correct Druid version.-->
-Apache Druid {{DRUIDVERSION}} contains over $NUMBER_FEATURES new features, bug
fixes, performance enhancements, documentation improvements, and additional
test coverage from $NUMBER_OF_CONTRIBUTORS contributors.
+Apache Druid 29.0.0 contains over 350 new features, bug fixes, performance
enhancements, documentation improvements, and additional test coverage from 67
contributors.
<!--
Replace {{MILESTONE}} with the correct milestone number. For example:
https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+
-->
-See the [complete set of
changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A{{MILESTONE}}+sort%3Aupdated-desc+)
for additional details, including bug fixes.
+See the [complete set of
changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A29.0.0+sort%3Aupdated-desc+)
for additional details, including bug fixes.
-Review the [upgrade notes](#upgrade-notes) and [incompatible
changes](#incompatible-changes) before you upgrade to Druid {{DRUIDVERSION}}.
+Review the [upgrade notes](#upgrade-notes) before you upgrade to Druid 29.0.0.
If you are upgrading across multiple versions, see the [Upgrade
notes](upgrade-notes.md) page, which lists upgrade notes for the most recent
Druid versions.
<!--
@@ -57,50 +57,624 @@ For tips about how to write a good release note, see
[Release notes](https://git
This section contains important information about new and existing features.
+### MSQ export statements (experimental)
+
+Druid 29.0.0 adds experimental support for export statements to the MSQ task
engine. This allows query tasks to write data to an external destination
through the [`EXTERN`
function](https://druid.apache.org/docs/latest/multi-stage-query/reference#extern-function).
+
+[#15689](https://github.com/apache/druid/pull/15689)
+
+### SQL PIVOT and UNPIVOT (experimental)
+
+Druid 29.0.0 adds experimental support for the SQL PIVOT and UNPIVOT operators.
+
+The PIVOT operator carries out an aggregation and transforms rows into columns
in the output. The following is the general syntax for the PIVOT operator:
+
+```sql
+PIVOT (aggregation_function(column_to_aggregate)
+ FOR column_with_values_to_pivot
+ IN (pivoted_column1 [, pivoted_column2 ...])
+)
+```
+
+The UNPIVOT operator transforms existing column values into rows. The
following is the general syntax for the UNPIVOT operator:
+
+```sql
+UNPIVOT (values_column
+ FOR names_column
+ IN (unpivoted_column1 [, unpivoted_column2 ... ])
+)
+```
+
+### Range support in window functions (experimental)
+
+Window functions (experimental) now support ranges where both endpoints are
unbounded or are the current row. Ranges work in strict mode, which means that
Druid will fail queries that aren't supported. You can turn off strict mode for
ranges by setting the context parameter `windowingStrictValidation` to `false`.
+
+The following example shows a window expression with RANGE frame
specifications:
+
+```sql
+(ORDER BY c)
+(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
+(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)
+```
+
+[#15703](https://github.com/apache/druid/pull/15703)
[#15746](https://github.com/apache/druid/pull/15746)
+
+### Improved INNER joins
+
+Druid now supports arbitrary join conditions for INNER join. Any
sub-conditions that can't be evaluated as part of the join are converted to a
post-join filter. Improved join capabilities allow Druid to more effectively
support applications like Tableau.
+
+[#15302](https://github.com/apache/druid/pull/15302)
+
+### Improved concurrent append and replace (experimental)
+
+You no longer have to manually determine the task lock type for concurrent
append and replace (experimental) with the `taskLockType` task context.
Instead, Druid can now determine it automatically for you. You can use the
context parameter `"useConcurrentLocks": true` for individual tasks and
datasources or enable concurrent append and replace at a cluster level using
`druid.indexer.task.default.context`.
+
+[#15684](https://github.com/apache/druid/pull/15684)
+
+### First and last aggregators for double, float, and long data types
+
+Druid now supports first and last aggregators for the double, float, and long
types in native and MSQ ingestion spec and MSQ queries. Previously, they were
only supported for native queries. For more information, see [First and last
aggregators](https://druid.apache.org/docs/latest/querying/aggregations/#first-and-last-aggregators).
+
+[#14462](https://github.com/apache/druid/pull/14462)
+
+Additionally, the following functions can now return numeric values:
+
+* EARLIEST and EARLIEST_BY
+* LATEST and LATEST_BY
+
+You can use these functions as aggregators at ingestion time.
+
+[#15607](https://github.com/apache/druid/pull/15607)
+
+### Support for logging audit events
+
+Added support for logging audit events and improved coverage of audited REST
API endpoints.
+To enable logging audit events, set config `druid.audit.manager.type` to `log`
in both the Coordinator and Overlord or in `common.runtime.properties`. When
you set `druid.audit.manager.type` to `sql`, audit events are persisted to
metadata store.
+
+In both cases, Druid audits the following events:
+
+* Coordinator
+ * Update load rules
+ * Update lookups
+ * Update coordinator dynamic config
+ * Update auto-compaction config
+* Overlord
+ * Submit a task
+ * Create/update a supervisor
+ * Update worker config
+* Basic security extension
+ * Create user
+ * Delete user
+ * Update user credentials
+ * Create role
+ * Delete role
+ * Assign role to user
+ * Set role permissions
+
+
+[#15480](https://github.com/apache/druid/pull/15480)
[#15653](https://github.com/apache/druid/pull/15653)
+
+Also fixed an issue with the basic auth integration test by not persisting
logs to the database.
+
+[#15561](https://github.com/apache/druid/pull/15561)
+
+### Enabled empty ingest queries
+
+The MSQ task engine now allows empty ingest queries by default. Previously,
ingest queries that produced no data would fail with the `InsertCannotBeEmpty`
MSQ fault.
+For more information, see [Empty ingest queries in the upgrade
notes](#enabled-empty-ingest-queries).
+
+[#15674](https://github.com/apache/druid/pull/15674)
[#15495](https://github.com/apache/druid/pull/15495)
+
+In the web console, you can use a toggle to control whether an ingestion fails
if the ingestion query produces no data.
+
+[#15627](https://github.com/apache/druid/pull/15627)
+
+### MSQ support for Google Cloud Storage
+
+The MSQ task engine now supports Google Cloud Storage (GCS). You can use
durable storage with GCS. See [Durable storage
configurations](https://druid.apache.org/docs/latest/multi-stage-query/reference#durable-storage-configurations)
for more information.
+
+[#15398](https://github.com/apache/druid/pull/15398)
+
+### Experimental extensions
+
+Druid 29.0.0 adds the following extensions.
+
+#### DDSketch
+
+A new DDSketch extension is available as a community contribution. The
DDSketch extension (`druid-ddsketch`) provides support for approximate quantile
queries using the [DDSketch](https://github.com/datadog/sketches-java) library.
+
+[#15049](https://github.com/apache/druid/pull/15049)
+
+#### Spectator histogram
+
+A new histogram extension is available as a community contribution. The
Spectator-based histogram extension (`druid-spectator-histogram`) provides
approximate histogram aggregators and percentile post-aggregators based on
[Spectator](https://netflix.github.io/atlas-docs/spectator/) fixed-bucket
histograms.
+
+[#15340](https://github.com/apache/druid/pull/15340)
+
+#### Delta Lake
+
+A new Delta Lake extension is available as a community contribution. The Delta
Lake extension (`druid-deltalake-extensions`) lets you use the [Delta Lake
input
source](https://druid.apache.org/docs/latest/development/extensions-contrib/delta-lake)
to ingest data stored in a Delta Lake table into Apache Druid.
+
+[#15755](https://github.com/apache/druid/pull/15755)
+
## Functional area and related changes
This section contains detailed release notes separated by areas.
### Web console
+#### Support for array types
+
+Added support for array types for all the ingestion wizards.
+
+
+
+When loading multi-value dimensions or arrays using the Druid **Query**
console, note the value of the `arrayIngestMode` parameter. Druid now
configures the `arrayIngestMode` parameter in the data loading flow, and its
value can persist across the SQL tab, even if you execute unrelated Data
Manipulation Language (DML) operations within the same tab.
+
+[#15588](https://github.com/apache/druid/pull/15588)
+
+#### File inputs for query detail archive
+
+The **Load query detail archive** now supports loading queries by selecting a
JSON file directly or dragging the file into the dialog.
+
+
+
+[#15632](https://github.com/apache/druid/pull/15632)
+
+#### Improved lookup dialog
+
+The lookup dialog in the web console now includes following optional fields.
See [JDBC
lookup](https://druid.apache.org/docs/latest/development/extensions-core/lookups-cached-global#jdbc-lookup)
for more information.
+
+* Jitter seconds
+* Load timeout seconds
+* Max heap percentage
+
+
+
+[#15472](https://github.com/apache/druid/pull/15472/)
+
+#### Improved time chart brush and added auto-granularity
+
+Improved the web console **Explore** view as follows:
+
+* Added the notion of timezone in the explore view.
+* Time chart is now able to automatically pick a granularity if "auto" is
selected (which is the default) based on the current time filter extent.
+* Brush is now automatically enabled in the time chart.
+* Brush interval snaps to the selected time granularity.
+* Added a highlight bubble to all visualizations (except table because it has
its own).
+
+[#14990](https://github.com/apache/druid/pull/14990)
+
#### Other web console improvements
-### Ingestion
+* Added the ability to detect multiple `EXPLAIN PLAN` queries in the workbench
and run them individually [#15570](https://github.com/apache/druid/pull/15570)
+* Added the ability to sort a segment table on start and end when grouping by
interval [#15720](https://github.com/apache/druid/pull/15720)
+* Improved the time shift for compare logic in the web console to include
literals [#15433](https://github.com/apache/druid/pull/15433)
+* Improved robustness of time shifting in tables in Explore view
[#15359](https://github.com/apache/druid/pull/15359)
+* Improved ingesting data using the web console
[#15339](https://github.com/apache/druid/pull/15339)
+* Improved management proxy detection
[#15453](https://github.com/apache/druid/pull/15453)
+* Fixed rendering on a disabled worker
[#15712](https://github.com/apache/druid/pull/15712)
+* Fix an issue where `waitUntilSegmentLoad` would always be set to `true` even
if explicitly set to `false`
[#15781](https://github.com/apache/druid/pull/15781)
+* Enabled table driven query modification actions to work with slices
[#15779](https://github.com/apache/druid/pull/15779)
+
+### General ingestion
+
+#### Added system fields to input sources
+
+Added the option to return system fields when defining an input source. This
allows for ingestion of metadata, such as an S3 object's URI.
+
+[#15276](https://github.com/apache/druid/pull/15276)
+
+#### Changed how Druid allocates weekly segments
+
+When the requested granularity is a month or larger but a segment can't be
allocated, Druid resorts to day partitioning.
+Unless explicitly specified, Druid skips week-granularity segments for data
partitioning because these segments don't align with the end of the month or
more coarse-grained intervals.
+
+Previously, if Druid couldn't allocate segments by month, it tried allocating
them by week next.
+In the new behavior, Druid skips partitioning by week and goes directly to
day. Week segments can only be allocated if the chosen partitioning in the
append task is WEEK.
+
+[#15589](https://github.com/apache/druid/pull/15589)
+
+#### Changed how empty or null array columns are stored
+
+Columns ingested with the `auto` column indexer that contain only empty or
null containing arrays are now stored as `ARRAY<LONG>` instead of
`COMPLEX<json>`.
+
+[#15505](https://github.com/apache/druid/pull/15505)
+
+#### Enabled skipping compaction for datasources with partial-eternity segments
+
+Druid now skips compaction for datasources with segments that have an interval
start or end which coincides with Eternity interval end-points.
+
+[#15542](https://github.com/apache/druid/pull/15542)
+
+#### Kill task improvements
+
+Improved kill tasks as follows:
+
+* Resolved an issue where the auto-kill feature failed to honor the specified
buffer period. This occurred when multiple unused segments within an interval
were marked as unused at different times.
+* You can submit kill tasks with an optional parameter
`maxUsedStatusLastUpdatedTime`. When set to a date time, the kill task
considers segments in the specified interval marked as unused no later than
this time. The default behavior is to kill all unused segments in the interval
regardless of the time when segments where marked as unused.
+
+[#15710](https://github.com/apache/druid/pull/15710)
+
+#### Segment allocation improvements
+
+Improved segment allocation as follows:
+
+* Changed how Druid allocates weekly segments
[#15589](https://github.com/apache/druid/pull/15589)
+* Enhanced polling in segment allocation queue
[#15590](https://github.com/apache/druid/pull/15590)
+* Fixed an issue in segment allocation that could cause loss of appended data
when running interleaved append and replace tasks
[#15459](https://github.com/apache/druid/pull/15459)
+
+#### Other ingestion improvements
+
+* Added a default implementation for the `evalDimension` method in the
`RowFunction` interface [#15452](https://github.com/apache/druid/pull/15452)
+* Added a configurable delay to the Peon service that determines how long a
Peon should wait before dropping a segment
[#15373](https://github.com/apache/druid/pull/15373)
+* Improved metadata store updates by attempting to retry updates rather than
failing [#15141](https://github.com/apache/druid/pull/15141)
+* Improved the error message you get when `taskQueue` reaches `maxSize`
[#15409](https://github.com/apache/druid/pull/15409)
+* Fixed an issue with columnar frames always writing multi-valued columns
where the input column had `hasMultipleValues = UNKNOWN`
[#15300](https://github.com/apache/druid/pull/15300)
+* Fixed a race condition where there were multiple attempts to publish
segments for the same sequence
[#14995](https://github.com/apache/druid/pull/14995)
+* Fixed a race condition that can occur at high streaming concurrency
[#15174](https://github.com/apache/druid/pull/15174)
+* Fixed an issue where complex types that are also numbers were assumed to
also be double [#15272](https://github.com/apache/druid/pull/15272)
+* Fixed an issue with unnecessary retries triggered when exceptions like
`IOException` obfuscated S3 exceptions
[#15238](https://github.com/apache/druid/pull/15238)
+* Fixed segment retrieval when the input interval does not lie within the
years `[1000, 9999]` [#15608](https://github.com/apache/druid/pull/15608)
+* Fixed empty strings being incorrectly converted to null values
[#15525](https://github.com/apache/druid/pull/15525)
+* Simplified `IncrementalIndex` and `OnHeapIncrementalIndex` by removing some
parameters [#15448](https://github.com/apache/druid/pull/15448)
+* Updated active task payloads being accessed from memory before reverting to
the metadata store [#15377](https://github.com/apache/druid/pull/15377)
+* Updated `OnheapIncrementalIndex` to no longer try to offer a thread-safe
"add" method [#15697](https://github.com/apache/druid/pull/15697)
+
+### SQL-based ingestion
+
+#### Added `castToType` parameter
+
+Added optional `castToType` parameter to `auto` column schema.
+
+[#15417](https://github.com/apache/druid/pull/15417)
+
+#### Improved the EXTEND operator
+
+The EXTEND operator now supports the following array types: `VARCHAR ARRAY`,
`BIGINT ARRAY`, `FLOAT ARRAY`, and `DOUBLE ARRAY`.
+
+The following example shows an extern input with Druid native input types
`ARRAY<STRING>`, `ARRAY<LONG>` and `STRING`:
+
+```sql
+EXTEND (a VARCHAR ARRAY, b BIGINT ARRAY, c VARCHAR)
+```
+
+[#15458](https://github.com/apache/druid/pull/15458)
+
+#### Improved tombstone generation to honor granularity specified in a
`REPLACE` query
+
+MSQ `REPLACE` queries now generate tombstone segments honoring the segment
granularity specified in the query rather than generating irregular tombstones.
If a query generates more than 5000 tombstones, Druid returns an MSQ
`TooManyBucketsFault` error, similar to the behavior with data segments.
-#### SQL-based ingestion
+[#15243](https://github.com/apache/druid/pull/15243)
-##### Other SQL-based ingestion improvements
+#### Improved hash joins using filters
-#### Streaming ingestion
+Improved consistency of JOIN behavior for queries using either the native or
MSQ task engine to prune based on base (left-hand side) columns only.
-##### Other streaming ingestion improvements
+[#15299](https://github.com/apache/druid/pull/15299)
+
+#### Configurable page size limit
+
+You can now limit the pages size for results of SELECT queries run using the
MSQ task engine. See `rowsPerPage` in the [SQL-based ingestion
reference](https://druid.apache.org/docs/latest/multi-stage-query/reference).
+
+### Streaming ingestion
+
+#### Improved Amazon Kinesis automatic reset
+
+Changed Amazon Kinesis automatic reset behavior to only reset the checkpoints
for partitions where sequence numbers are unavailable.
+
+[#15338](https://github.com/apache/druid/pull/15338)
### Querying
-#### Other querying improvements
+#### Added IPv6_MATCH SQL function
+
+Added IPv6_MATCH SQL function for matching IPv6 addresses in a subnet:
+
+```sql
+IPV6_MATCH(address, subnet)
+```
+
+[#15212](https://github.com/apache/druid/pull/15212/)
+
+#### Added JSON_QUERY_ARRAY function
+
+Added JSON_QUERY_ARRAY which is similar to JSON_QUERY except the return type
is always `ARRAY<COMPLEX<json>>` instead of `COMPLEX<json>`. Essentially, this
function allows extracting arrays of objects from nested data and performing
operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available
ARRAY operations.
+
+[#15521](https://github.com/apache/druid/pull/15521)
+
+#### Added support for `aggregateMultipleValues`
+
+Improved the `ANY_VALUE(expr)` function to support the boolean option
`aggregateMultipleValues`. The `aggregateMultipleValues` option is enabled by
default. When you run ANY_VALUE on an MVD, the function returns the stringified
array. If `aggregateMultipleValues` is set to `false`, ANY_VALUE returns the
first value instead.
+
+[#15434](https://github.com/apache/druid/pull/15434)
+
+#### Added native `arrayContainsElement` filter
+
+Added native `arrayContainsElement` filter to improve performance when using
ARRAY_CONTAINS on array columns.
+
+[#15366](https://github.com/apache/druid/pull/15366)
[#15455](https://github.com/apache/druid/pull/15455)
+
+Also ARRAY_OVERLAP now uses the `arrayContainsElement` filter when filtering
ARRAY typed columns, so that it can use indexes like ARRAY_CONTAINS.
+
+[#15451](https://github.com/apache/druid/pull/15451)
+
+#### Added index support
+
+Improved nested JSON columns as follows:
+
+* Added `ValueIndexes` and `ArrayElementIndexes` for nested arrays.
+* Added `ValueIndexes` for nested long and double columns.
+
+[#15752](https://github.com/apache/druid/pull/15752)
+
+#### Improved `timestamp_extract` function
+
+The `timestamp_extract(expr, unit, [timezone])` Druid native query function
now supports dynamic values.
-### Cluster management
+[#15586](https://github.com/apache/druid/pull/15586)
-#### Other cluster management improvements
+#### Improved JSON_VALUE and JSON_QUERY
+
+Added support for using expressions to compute the JSON path argument for
JSON_VALUE and JSON_QUERY functions dynamically. The JSON path argument doesn't
have to be a constant anymore.
+
+[#15320](https://github.com/apache/druid/pull/15320)
+
+#### Improved filtering performance for lookups
+
+Enhanced filtering performance for lookups as follows:
+
+* Added `sqlReverseLookupThreshold` SQL query context parameter.
`sqlReverseLookupThreshold` represents the maximum size of an IN filter that
will be created as part of lookup reversal
[#15832](https://github.com/apache/druid/pull/15832)
+* Improved loading and dropping of containers for lookups to reduce
inconsistencies during updates
[#14806](https://github.com/apache/druid/pull/14806)
+* Changed behavior for initialization of lookups to load the first lookup as
is, regardless of cache status
[#15598](https://github.com/apache/druid/pull/15598)
+
+#### Enabled query request queuing by default when total laning is turned on
+
+When query scheduler threads are less than server HTTP threads, total laning
turns on.
+This reserves some HTTP threads for non-query requests such as health checks.
+The total laning previously would reject any query request that exceeds the
lane capacity.
+Now, excess requests will instead be queued with a timeout equal to
`MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)`.
+
+[#15440](https://github.com/apache/druid/pull/15440)
+
+#### Other querying improvements
+
+* Added a supplier that can return `NullValueIndex` to be used by
`NullFilter`. This improvement should speed up `is null` and `is not null`
filters on JSON columns [#15687](https://github.com/apache/druid/pull/15687)
+* Added an option to compare results with relative error tolerance
[#15429](https://github.com/apache/druid/pull/15429)
+* Added capability for the Broker to access datasource schemas defined in the
catalog when processing SQL queries
[#15469](https://github.com/apache/druid/pull/15469)
+* Added CONCAT flattening and filter decomposition
[#15634](https://github.com/apache/druid/pull/15634)
+* Enabled ARRAY_TO_MV to support expression inputs
[#15528](https://github.com/apache/druid/pull/15528)
+* Improved `ExpressionPostAggregator` to handle ARRAY types output by the
grouping engine [#15543](https://github.com/apache/druid/pull/15543)
+* Improved the error message you get when there's an error in the specified
interval [#15454](https://github.com/apache/druid/pull/15454)
+* Improved how three-valued logic is handled
[#15629](https://github.com/apache/druid/pull/15629)
+* Improved error reporting for math functions
[#14987](https://github.com/apache/druid/pull/14987)
+* Improved handling of COALESCE, SEARCH, and filter optimization
[#15609](https://github.com/apache/druid/pull/15609)
+* Increased memory available for subqueries when the query scheduler is
configured to limit queries below the number of server threads
[#15295](https://github.com/apache/druid/pull/15295)
+* Optimized SQL planner for filter expressions by introducing column indexes
for expression virtual columns
[#15585](https://github.com/apache/druid/pull/15585)
+* Optimized queries involving large NOT IN operations
[#15625](https://github.com/apache/druid/pull/15625)
+* Fixed an issue with nested empty array fields
[#15532](https://github.com/apache/druid/pull/15532)
+* Fixed NPE with virtual expression with unnest
[#15513](https://github.com/apache/druid/pull/15513)
+* Fixed an issue with AND and OR operators and numeric `nvl` not clearing out
stale null vectors for vector expression processing
[#15587](https://github.com/apache/druid/pull/15587)
+* Fixed an issue with filtering columns when using partial paths such as in
`JSON_QUERY` [#15643](https://github.com/apache/druid/pull/15643)
+* Fixed queries that raise an exception when sketches are stored in cache
[#15654](https://github.com/apache/druid/pull/15654)
+* Fixed queries involving JSON functions that failed when using negative
indexes [#15650](https://github.com/apache/druid/pull/15650)
+* Fixed an issue where queries involving filters on TIME_FLOOR could encounter
`ClassCastException` when comparing `RangeValue` in `CombineAndSimplifyBounds`
[#15778](https://github.com/apache/druid/pull/15778)
### Data management
+#### Changed `numCorePartitions` to 0 for tombstones
+
+Tombstone segments now have 0 core partitions. This means they can be dropped
or removed independently without affecting availability of other appended
segments in the same co-partition space. Prior to this change, removing
tombstones with 1 core partition that contained appended segments in the
partition space could make the appended segments unavailable.
+
+[#15379](https://github.com/apache/druid/pull/15379)
+
+#### Clean up duty for non-overlapping eternity tombstones
+
+Added `MarkEternityTombstonesAsUnused` to clean up non-overlapping eternity
tombstones—tombstone segments that either start at `-INF` or end at `INF`
and don't overlap with any overshadowed used segments in the datasource.
+
+Also added a new metric `segment/unneededEternityTombstone/count` to count the
number of dropped non-overshadowed eternity tombstones per datasource.
+
+[#15281](https://github.com/apache/druid/pull/15281)
+
+#### Enabled skipping compaction for datasources with partial-eternity segments
+
+Druid now skips compaction for datasources with segments that have their
interval start or end coinciding with Eternity interval end-points.
+
+[#15542](https://github.com/apache/druid/pull/15542)
+
+#### Enhanced the JSON parser unexpected token logging
+
+The JSON parser unexpected token error now includes the context of the
expected `VALUE_STRING` token. This makes it easier to track mesh/proxy network
error messages and to avoid unnecessary research into Druid server rest
endpoint responses.
+
+[#15176](https://github.com/apache/druid/pull/15176)
+
#### Other data management improvements
+* Fixed an issue where the Broker would return an HTTP `400` status code
instead of `503` when a Coordinator was temporarily unavailable, such as during
a rolling upgrade [#15756](https://github.com/apache/druid/pull/15756)
+* Added user identity to Router query request logs
[#15126](https://github.com/apache/druid/pull/15126)
+* Improved process to retrieve segments from metadata store by retrieving
segments in batches [#15305](https://github.com/apache/druid/pull/15305)
+* Improved logging messages when skipping auto-compaction for a data source
[#15460](https://github.com/apache/druid/pull/15460)
+* Improved compaction by modifying the segment iterator to skip intervals
without data [#15676](https://github.com/apache/druid/pull/15676)
+* Increased `_acceptQueueSize` based on value of `net.core.somaxconn`
[#15596](https://github.com/apache/druid/pull/15596)
+* Optimized the process to mark segments as unused
[#15352](https://github.com/apache/druid/pull/15352)
+* Updated auto-compaction to preserve spatial dimensions rather than rewrite
them into regular string dimensions
[#15321](https://github.com/apache/druid/pull/15321)
+
### Metrics and monitoring
+* Added worker status and duration metrics in live and task reports
[#15180](https://github.com/apache/druid/pull/15180)
+* Updated `serviceName` for `segment/count` metric to match the configured
metric name within the StatsD emitter
[#15347](https://github.com/apache/druid/pull/15347)
+
### Extensions
-### Documentation improvements
+#### Basic security improvements
+
+The computed hash values of passwords are now cached for the
`druid-basic-security` extension to boost authentication validator performance.
+
+[#15648](https://github.com/apache/druid/pull/15648)
+
+#### DataSketches improvements
+
+* Improved performance of HLL sketch merge aggregators
[#15162](https://github.com/apache/druid/pull/15162)
+* Updated histogram post-aggregators for Quantiles and KLL sketches for when
all values in the sketch are equal. Previously these queries fail but now
return `[N, 0, 0, ...]`, where N is the number of values in the sketch, and the
length of the list is equal to the value assigned to `numBins`
[#15381](https://github.com/apache/druid/pull/15381)
+
+#### Microsoft Azure improvements
+
+* Added support for Azure Storage Accounts authentication options
[#15287](https://github.com/apache/druid/pull/15287)
+* Added support for Azure Government when using Microsoft Azure Storage for
deep storage [#15523](https://github.com/apache/druid/pull/15523)
+* Fixed the `batchDeleteFiles` method in Azure Storage
[#15730](https://github.com/apache/druid/pull/15730)
+
+#### Kubernetes improvements
+
+* Added cleanup lifecycle management for MiddleManager-less task scheduling
[#15133](https://github.com/apache/druid/pull/15133)
+* Fixed an issue where the Overlord does not start when a cluster does not use
a MiddleManager or ZooKeeper
[#15445](https://github.com/apache/druid/pull/15445)
+* Improved logs and status messages for MiddleManager-less ingestion
[#15527](https://github.com/apache/druid/pull/15527)
+
+#### Kafka emitter improvements
+
+* Added a config option to the Kafka emitter that lets you mask sensitive
values for the Kafka producer. This feature is optional and will not affect
prior configs for the emitter
[#15485](https://github.com/apache/druid/pull/15485)
+* Resolved `InterruptedException` logging in ingestion task logs
[#15519](https://github.com/apache/druid/pull/15519)
+
+#### Prometheus emitter improvements
+
+You can configure the `pushgateway` strategy to delete metrics from Prometheus
push gateway on task shutdown using the following Prometheus emitter
configurations:
+
+* `druid.emitter.prometheus.deletePushGatewayMetricsOnShutdown`: When set to
true, peon tasks delete metrics from the Prometheus push gateway on task
shutdown. Default value is false.
+* `druid.emitter.prometheus.waitForShutdownDelay`: Time in milliseconds to
wait for peon tasks to delete metrics from `pushgateway` on shutdown.
Applicable only when
`druid.emitter.prometheus.deletePushGatewayMetricsOnShutdown` is set to true.
Default value is none, meaning that there is no delay between peon task
shutdown and metrics deletion from the push gateway.
+
+[#14935](https://github.com/apache/druid/pull/14935)
+
+#### Iceberg improvements
+
+Improved the Iceberg extension as follows:
+
+* Added a parameter `snapshotTime` to the iceberg input source spec that
allows the user to ingest data files associated with the most recent snapshot.
This helps the user ingest data based on older snapshots by specifying the
associated snapshot time [#15348](https://github.com/apache/druid/pull/15348)
+* Added a new Iceberg ingestion filter of type `range` to filter on ranges of
column values [#15782](https://github.com/apache/druid/pull/15782)
+* Fixed a typo in the Iceberg warehouse path for s3
[#15823](https://github.com/apache/druid/pull/15823)
## Upgrade notes and incompatible changes
### Upgrade notes
-### Incompatible changes
+#### Changed `equals` filter for native queries
+
+The [equality
filter](https://druid.apache.org/docs/latest/querying/filters#equality-filter)
on mixed type `auto` columns that contain arrays must now be filtered as their
presenting type. This means that if any rows are arrays (for example, the
segment metadata and `information_schema` reports the type as some array type),
then the native queries must also filter as if they are some array type.
+
+This change impacts mixed type `auto` columns that contain both scalars and
arrays. It doesn't impact SQL, which already has this limitation due to how the
type presents itself.
+
+[#15503](https://github.com/apache/druid/pull/15503)
+
+#### Console automatically sets `arrayIngestMode` for MSQ queries
+
+Druid console now configures the `arrayIngestMode` parameter in the data
loading flow, and its value can persist across the SQL tab unless manually
updated. When loading multi-value dimensions or arrays in the Druid console,
note the value of the `arrayIngestMode` parameter to prevent mixing multi-value
dimensions and arrays in the same column of a data source.
+
+[#15588](https://github.com/apache/druid/pull/15588)
+
+#### Improved concurrent append and replace (experimental)
+
+You no longer have to manually determine the task lock type for concurrent
append and replace (experimental) with the `taskLockType` task context.
Instead, Druid can now determine it automatically for you. You can use the
context parameter `"useConcurrentLocks": true` for individual tasks and
datasources or enable concurrent append and replace at a cluster level using
`druid.indexer.task.default.context`.
+
+[#15684](https://github.com/apache/druid/pull/15684)
+
+#### Enabled empty ingest queries
+
+The MSQ task engine now allows empty ingest queries by default. For queries
that don't generate any output rows, the MSQ task engine reports zero values
for `numTotalRows` and `totalSizeInBytes` instead of null. Previously, ingest
queries that produced no data would fail with the `InsertCannotBeEmpty` MSQ
fault.
+
+To revert to the original behavior, set the MSQ query parameter
`failOnEmptyInsert` to `true`.
+
+[#15495](https://github.com/apache/druid/pull/15495)
[#15674](https://github.com/apache/druid/pull/15674)
+
+#### Enabled query request queuing by default when total laning is turned on
+
+When query scheduler threads are less than server HTTP threads, total laning
turns on.
+This reserves some HTTP threads for non-query requests such as health checks.
+The total laning previously would reject any query request that exceeds the
lane capacity.
+Now, excess requests will instead be queued with a timeout equal to
`MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)`.
+
+[#15440](https://github.com/apache/druid/pull/15440)
+
+#### Changed how empty or null array columns are stored
+
+Columns ingested with the auto column indexer that contain only empty or null
arrays are now stored as `ARRAY<LONG\>` instead of `COMPLEX<json\>`.
+
+[#15505](https://github.com/apache/druid/pull/15505)
+
+#### Changed how Druid allocates weekly segments
+
+When the requested granularity is a month or larger but a segment can't be
allocated, Druid resorts to day partitioning.
+Unless explicitly specified, Druid skips week-granularity segments for data
partitioning because these segments don't align with the end of the month or
more coarse-grained intervals.
+
+Previously, if Druid couldn't allocate segments by month, it tried allocating
them by week next.
+In the new behavior, Druid skips partitioning by week and goes directly to
day. Week segments can only be allocated if the chosen partitioning in the
append task is WEEK.
+
+[#15589](https://github.com/apache/druid/pull/15589)
+
+### Removed the `auto` search strategy
+
+Removed the `auto` search strategy from the native search query. Setting
`searchStrategy` to `auto` is now equivalent to `useIndexes`.
+
+[#15550](https://github.com/apache/druid/pull/15550)
### Developer notes
+#### Improved `InDimFilter` reverse-lookup optimization
+
+This improvement includes the following changes:
+
+* Added the `mayIncludeUnknown` parameter to `DimFilter#optimize`.
+* Enabled `InDimFilter#optimizeLookup` to handle `mayIncludeUnknown` and
perform reverse lookups in a wider range of cases.
+* Made `unapply` method in `LookupExtractor` protected and relocated callers
to `unapplyAll`.
+
+If your extensions provide a `DimFilter`, you may need to rebuild them to
ensure compatibility with this release.
+
+[#15611](https://github.com/apache/druid/pull/15611)
+
+#### Other developer improvements
+
+* Fixed an issue with the Druid Docker image
[#15264](https://github.com/apache/druid/pull/15264)
+
+#### Web console logging
+
+The web console now logs request errors in end-to-end tests to help with
debugging.
+
+[#15483](https://github.com/apache/druid/pull/15483)
+
#### Dependency updates
-The following dependencies have had their versions bumped:
\ No newline at end of file
+The following dependencies have been updated:
+
+* Added `chronoshift` as a dependency
[#14990](https://github.com/apache/druid/pull/14990)
+* Added `gson` to `pom.xml`
[#15488](https://github.com/apache/druid/pull/15488/)
+* Updated Confluent's dependencies to 6.2.12
[#15441](https://github.com/apache/druid/pull/15441)
+* Excluded `jackson-jaxrs` from `ranger-plugin-common`, which isn't required,
to address CVEs [#15481](https://github.com/apache/druid/pull/15481)
+* Updated AWS SDK version to `1.12.638`
[#15814](https://github.com/apache/druid/pull/15814)
+* Updated Avro to 1.11.3 [#15419](https://github.com/apache/druid/pull/15419)
+* Updated Ranger libraries to the newest available version
[#15363](https://github.com/apache/druid/pull/15363)
+* Updated the iceberg core version to 1.4.1
[#15348](https://github.com/apache/druid/pull/15348)
+* Reduced dependency footprint for the iceberg extension
[#15280](https://github.com/apache/druid/pull/15280)
+* Updated `com.github.eirslett` version to 1.15.0
[#15556](https://github.com/apache/druid/pull/15556)
+* Updated multiple webpack dependencies:
+ * `webpack` to 5.89.0
+ * `webpack-bundle-analyzer` to 4.10.1
+ * `webpack-cli` to 5.1.4
+ * `webpack-dev-server` to 4.15.1
+
+ [#15555](https://github.com/apache/druid/pull/15555)
+* Updated `pac4j-oidc` java security library version to 4.5.7
[#15522](https://github.com/apache/druid/pull/15522)
+* Updated `io.kubernetes.client-java` version to 19.0.0 and `docker-java-bom`
to 3.3.4 [#15449](https://github.com/apache/druid/pull/15449)
+* Updated core Apache Kafka dependencies to 3.6.1
[#15539](https://github.com/apache/druid/pull/15539)
+* Updated and pruned multiple dependencies for the web console, including
dropping Babel. As a result, Internet Explorer 11 is no longer supported with
the web console [#15487](https://github.com/apache/druid/pull/15487)
+* Updated Apache Zookeeper to 3.8.3 from 3.5.10
[#15477](https://github.com/apache/druid/pull/15477)
+* Updated Gauva to 32.0.1 from 31.1
[#15482](https://github.com/apache/druid/pull/15482)
+* Updated multiple dependencies to address CVEs:
+ * `dropwizard-metrics` to 4.2.22 to address GHSA-mm8h-8587-p46h in
`com.rabbitmq:amqp-client`
+ * `ant` to 1.10.14 to resolve GHSA-f62v-xpxf-3v68, GHSA-4p6w-m9wc-c9c9,
GHSA-q5r4-cfpx-h6fh, and GHSA-5v34-g2px-j4fw
+ * `comomons-compress` to 1.24.0 to resolve GHSA-cgwf-w82q-5jrr
+ * `jose4j` to 0.9.3 to resolve GHSA-7g24-qg88-p43q and GHSA-jgvc-jfgh-rjvv
+ * `kotlin-stdlib` to 1.6.0 to resolve GHSA-cqj8-47ch-rvvq and CVE-2022-24329
+
+ [#15464](https://github.com/apache/druid/pull/15464)
+* Updated Jackson to version 2.12.7.1 to address CVE-2022-42003 and
CVE-2022-42004 which affects `jackson-databind`
[#15461](https://github.com/apache/druid/pull/15461)
+* Updated `com.google.code.gson:gson` from 2.2.4 to 2.10.1 since 2.2.4 is
affected by CVE-2022-25647 [#15461](https://github.com/apache/druid/pull/15461)
+* Updated Jedis to version 5.0.2
[#15344](https://github.com/apache/druid/pull/15344)
+* Updated `commons-codec:commons-codec` from 1.13 to 1.16.0
[#14819](https://github.com/apache/druid/pull/14819)
+* Updated Nimbus version to `8.22.1`
[#15753](https://github.com/apache/druid/pull/15753)
\ No newline at end of file
diff --git a/docs/release-info/upgrade-notes.md
b/docs/release-info/upgrade-notes.md
index 46e5ed6fc1a..d564224ee01 100644
--- a/docs/release-info/upgrade-notes.md
+++ b/docs/release-info/upgrade-notes.md
@@ -26,6 +26,69 @@ The upgrade notes assume that you are upgrading from the
Druid version that imme
For the full release notes for a specific version, see the [releases
page](https://github.com/apache/druid/releases).
+## 29.0.0
+
+### Upgrade notes
+
+#### Changed `equals` filter for native queries
+
+The [equality
filter](https://druid.apache.org/docs/latest/querying/filters#equality-filter)
on mixed type `auto` columns that contain arrays must now be filtered as their
presenting type. This means that if any rows are arrays (for example, the
segment metadata and `information_schema` reports the type as some array type),
then the native queries must also filter as if they are some array type.
+
+This change impacts mixed type `auto` columns that contain both scalars and
arrays. It doesn't impact SQL, which already has this limitation due to how the
type presents itself.
+
+[#15503](https://github.com/apache/druid/pull/15503)
+
+#### Console automatically sets `arrayIngestMode` for MSQ queries
+
+Druid console now configures the `arrayIngestMode` parameter in the data
loading flow, and its value can persist across the SQL tab unless manually
updated. When loading multi-value dimensions or arrays in the Druid console,
note the value of the `arrayIngestMode` parameter to prevent mixing multi-value
dimensions and arrays in the same column of a data source.
+
+[#15588](https://github.com/apache/druid/pull/15588)
+
+#### Improved concurrent append and replace (experimental)
+
+You no longer have to manually determine the task lock type for concurrent
append and replace (experimental) with the `taskLockType` task context.
Instead, Druid can now determine it automatically for you. You can use the
context parameter `"useConcurrentLocks": true` for individual tasks and
datasources or enable concurrent append and replace at a cluster level using
`druid.indexer.task.default.context`.
+
+[#15684](https://github.com/apache/druid/pull/15684)
+
+#### Enabled empty ingest queries
+
+The MSQ task engine now allows empty ingest queries by default. For queries
that don't generate any output rows, the MSQ task engine reports zero values
for `numTotalRows` and `totalSizeInBytes` instead of null. Previously, ingest
queries that produced no data would fail with the `InsertCannotBeEmpty` MSQ
fault.
+
+To revert to the original behavior, set the MSQ query parameter
`failOnEmptyInsert` to `true`.
+
+[#15495](https://github.com/apache/druid/pull/15495)
[#15674](https://github.com/apache/druid/pull/15674)
+
+#### Enabled query request queuing by default when total laning is turned on
+
+When query scheduler threads are less than server HTTP threads, total laning
turns on.
+This reserves some HTTP threads for non-query requests such as health checks.
+The total laning previously would reject any query request that exceeds the
lane capacity.
+Now, excess requests will instead be queued with a timeout equal to
`MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)`.
+
+[#15440](https://github.com/apache/druid/pull/15440)
+
+#### Changed how empty or null array columns are stored
+
+Columns ingested with the auto column indexer that contain only empty or null
arrays are now stored as `ARRAY<LONG\>` instead of `COMPLEX<json\>`.
+
+[#15505](https://github.com/apache/druid/pull/15505)
+
+#### Changed how Druid allocates weekly segments
+
+When the requested granularity is a month or larger but a segment can't be
allocated, Druid resorts to day partitioning.
+Unless explicitly specified, Druid skips week-granularity segments for data
partitioning because these segments don't align with the end of the month or
more coarse-grained intervals.
+
+Previously, if Druid couldn't allocate segments by month, it tried allocating
them by week next.
+In the new behavior, Druid skips partitioning by week and goes directly to
day. Week segments can only be allocated if the chosen partitioning in the
append task is WEEK.
+
+[#15589](https://github.com/apache/druid/pull/15589)
+
+### Removed the `auto` search strategy
+
+Removed the `auto` search strategy from the native search query. Setting
`searchStrategy` to `auto` is now equivalent to `useIndexes`.
+
+[#15550](https://github.com/apache/druid/pull/15550)
+
## 28.0.0
### Upgrade notes
diff --git a/website/.spelling b/website/.spelling
index 422d8a69a9b..30f389cc536 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -56,6 +56,8 @@ CloudWatch
ColumnDescriptor
Corretto
CLI
+CVE
+CVEs
DDL
DML
DNS
@@ -104,6 +106,7 @@ GPG
GSSAPI
GUIs
GroupBy
+Gauva
Guice
HDFS
HDFSFirehose
@@ -141,6 +144,7 @@ JDBC
JDK
JDK7
JDK8
+Jedis
JKS
jks
JMX
@@ -185,6 +189,7 @@ Murmur3
MVCC
MV_TO_ARRAY
NFS
+NPE
OCF
OIDC
OLAP
@@ -580,6 +585,7 @@ versioning
virtualColumns
w.r.t.
walkthrough
+webpack
whitelist
whitelisted
whitespace
@@ -2174,6 +2180,21 @@ UserGroupInformation
CVE-2019-17571
CVE-2019-12399
CVE-2018-17196
+GHSA-mm8h-8587-p46h
+GHSA-q5
+GHSA-f62v-xpxf-3v68
+GHS
+GHSA-4p6w-m9wc-c9c9
+GHSA-q5r4-cfpx-h6fh
+GHSA-5v34-g2px-j4fw
+GHSA-cgwf-w82q-5jrr
+GHSA-7g24-qg88-p43q
+GHSA-jgvc-jfgh-rjvv
+GHSA-cqj8-47ch-rvvq
+CVE-2022-24329
+CVE-2022-42003
+CVE-2022-42004
+CVE-2022-25647
bin.tar.gz
0s
1T
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]