ektravel commented on code in PR #15805: URL: https://github.com/apache/druid/pull/15805#discussion_r1480056632
########## docs/release-info/release-notes.md: ########## @@ -57,50 +57,557 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### SQL PIVOT and UNPIVOT (experimental) + +Druid 29.0.0 adds experimental support for the SQL PIVOT and UNPIVOT operators. + +The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator: + +```sql +PIVOT (aggregation_function(column_to_aggregate) + FOR column_with_values_to_pivot + IN (pivoted_column1 [, pivoted_column2 ...]) +) +``` + +The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator: + +```sql +UNPIVOT (values_column + FOR names_column + IN (unpivoted_column1 [, unpivoted_column2 ... ]) +) +``` + +### Range support in window functions + +Window functions now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter `windowingStrictValidation` to `false`. + +The following example shows a window expression with RANGE frame specifications: + +```sql +(ORDER BY c) +(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) +(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING) +``` + +[#15703](https://github.com/apache/druid/pull/15703) [#15746](https://github.com/apache/druid/pull/15746) + +### Improved INNER joins + +Druid now supports arbitrary join conditions for INNER join. Any sub-conditions that can't be evaluated as part of the join are converted to a post-join filter. Improved join capabilities allow Druid to more effectively support applications like Tableau. + +[#15302](https://github.com/apache/druid/pull/15302) + +### First and last aggregators for double, float, and long data types + +Druid 29.0.0 adds support for first and last aggregators for the double, float, and long types in an ingestion spec and MSQ queries. Previously, they were only supported for native queries. For more information, see [First and last aggregators](https://druid.apache.org/docs/latest/querying/aggregations/). + +[#14462](https://github.com/apache/druid/pull/14462) + +### Support for logging audit events + +Added support for logging audit events and improved coverage of audited REST API endpoints. To log audit events, set config `druid.audit.manager.type` to `log`. + +[#15480](https://github.com/apache/druid/pull/15480) [#15653](https://github.com/apache/druid/pull/15653) + +### Enabled empty ingest queries + +The MSQ task engine now allows empty ingest queries by default. Previously, ingest queries that produced no data would fail with the `InsertCannotBeEmpty` MSQ fault. +For more information, see [Empty ingest queries in the upgrade notes](#enabled-empty-ingest-queries). + +[#15674](https://github.com/apache/druid/pull/15674) [#15495](https://github.com/apache/druid/pull/15495) + +### Support for Google Cloud Storage + +Added support for Google Cloud Storage (GCS). You can now use durable storage with GCS. See [Durable storage configurations](https://druid.apache.org/docs/latest/multi-stage-query/reference#durable-storage-configurations) for more information. + +[#15398](https://github.com/apache/druid/pull/15398) + +### Experimental extensions + +Druid 29.0.0 adds the following extensions. + +#### DDSketch + +A new DDSketch extension is available as a community contribution. The DDSketch extension (`druid-ddsketch`) provides support for approximate quantile queries using the [DDSketch](https://github.com/datadog/sketches-java) library. + +[#15049](https://github.com/apache/druid/pull/15049) + +#### Spectator histogram + +A new histogram extension is available as a community contribution. The Spectator-based histogram extension (`druid-spectator-histogram`) provides approximate histogram aggregators and percentile post-aggregators based on [Spectator](https://netflix.github.io/atlas-docs/spectator/) fixed-bucket histograms. + +[#15340](https://github.com/apache/druid/pull/15340) + +#### Delta Lake + +A new Delta Lake extension is available as a community contribution. The Delta Lake extension (`druid-deltalake-extensions`) lets you use the [Delta Lake input source](https://druid.apache.org/docs/latest/development/extensions-contrib/delta-lake) to ingest data stored in a Delta Lake table into Apache Druid. + +[#15755](https://github.com/apache/druid/pull/15755) + +### Removed the `auto` search strategy + +Removed the `auto` search strategy from the native search query. Setting `searchStrategy` to `auto` is now equivalent to `useIndexes`. Improvements to how and when indexes are computed have allowed the `useIndexes` strategy to be more adaptive, skipping computing expensive indexes when possible. + +[#15550](https://github.com/apache/druid/pull/15550) + ## Functional area and related changes This section contains detailed release notes separated by areas. ### Web console +#### Improved lookup dialog + +The lookup dialog in the web console now includes following optional fields: + +* Jitter seconds +* Load timeout seconds +* Max heap percentage + + + +[#15472](https://github.com/apache/druid/pull/15472/) + +#### File inputs for query detail archive + +The **Load query detail archive** now supports loading queries by selecting a JSON file directly or dragging the file into the dialog. + + + +[#15632](https://github.com/apache/druid/pull/15632) + +#### Improved time chart brush and added auto-granularity + +* Added the notion of timezone in the explore view. +* Added `chronoshift` as a dependency. +* Time chart is now able to automatically pick a granularity if "auto" is selected (which is the default) based on the current time filter extent. +* Brush is now automatically enabled in the time chart. +* Brush interval snaps to the selected time granularity. +* Added a highlight bubble to all visualizations (except table because it has its own). + +[#14990](https://github.com/apache/druid/pull/14990) + +#### Toggle to fail on empty inserts + +Added a new toggle to fail when an ingestion query produces no data. + +[#15627](https://github.com/apache/druid/pull/15627) + #### Other web console improvements -### Ingestion +* Added the ability to detect multiple `EXPLAIN PLAN` queries in the workbench and run them individually [#15570](https://github.com/apache/druid/pull/15570) +* Added the ability to sort a segment table on start and end when grouping by interval [#15720](https://github.com/apache/druid/pull/15720) +* Improved the time shift for compare logic in the web console to include literals [#15433](https://github.com/apache/druid/pull/15433) +* Improved robustness of time shifting in tables in Explore view [#15359](https://github.com/apache/druid/pull/15359) +* Improved ingesting data using the web console [#15339](https://github.com/apache/druid/pull/15339) +* Fixed rendering on a disabled worker [#15712](https://github.com/apache/druid/pull/15712) +* Enabled table driven query modification actions to work with slices [#15779](https://github.com/apache/druid/pull/15779) + +### General ingestion + +#### Added system fields to input sources + +Added the option to return system fields when defining an input source. This allows for ingestion of metadata such as an S3 object's URI. + +[#15276](https://github.com/apache/druid/pull/15276) + +#### Changed how Druid allocates weekly segments + +When the requested granularity is a month or larger but a segment can't be allocated, Druid resorts to day partitioning. +Unless explicitly specified, Druid skips week-granularity segments for data partitioning because these segments don't align with the end of the month or more coarse-grained intervals. + +[#15589](https://github.com/apache/druid/pull/15589) + +#### Changed how empty or null array columns are stored + +Columns ingested with the auto column indexer that contain only empty or null containing arrays are now stored as `ARRAY<LONG\>` instead of `COMPLEX<json\>`. + +[#15505](https://github.com/apache/druid/pull/15505) + +#### Enabled skipping compaction for datasources with partial-eternity segments + +Druid now skips compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points. + +[#15542](https://github.com/apache/druid/pull/15542) + +#### Segment allocation improvements + +Improved segment allocation as follows: + +* Enhanced polling in segment allocation queue [#15590](https://github.com/apache/druid/pull/15590) +* Fixed an issue in segment allocation that could cause loss of appended data when running interleaved append and replace tasks [#15459](https://github.com/apache/druid/pull/15459) + +#### Other ingestion improvements + +* Added a context parameter `useConcurrentLocks` for concurrent locks. You can set it for an individual task or at a cluster level using `druid.indexer.task.default.context` [#15684](https://github.com/apache/druid/pull/15684) +* Added a default implementation for the `evalDimension` method in the RowFunction interface [#15452](https://github.com/apache/druid/pull/15452) +* Added a configurable delay to the Peon service that determines how long a Peon should wait before dropping a segment [#15373](https://github.com/apache/druid/pull/15373) +* Improved metadata store updates by attempting to retry updates rather than failing [#15141](https://github.com/apache/druid/pull/15141) +* Fixed an issue where `systemField` values weren't properly decorated in the sampling response [#15536](https://github.com/apache/druid/pull/15536) +* Fixed an issue with columnar frames always writing multi-valued columns where the input column had `hasMultipleValues = UNKNOWN` [#15300](https://github.com/apache/druid/pull/15300) +* Fixed a race condition where there were multiple attempts to publish segments for the same sequence [#14995](https://github.com/apache/druid/pull/14995) +* Fixed a race condition that can occur at high streaming concurrency [#15174](https://github.com/apache/druid/pull/15174) +* Fixed an issue where complex types that are also numbers were assumed to also be double [#15272](https://github.com/apache/druid/pull/15272) +* Fixed an issue with unnecessary retries triggered when exceptions like IOException obfuscated S3 exceptions [#15238](https://github.com/apache/druid/pull/15238) +* Fixed segment retrieval when the input interval does not lie within the years `[1000, 9999]` [#15608](https://github.com/apache/druid/pull/15608) +* Fixed empty strings being incorrectly converted to null values [#15525](https://github.com/apache/druid/pull/15525) +* Simplified `IncrementalIndex` and `OnHeapIncrementalIndex` by removing some parameters [#15448](https://github.com/apache/druid/pull/15448) +* Updated active task payloads being accessed from memory before reverting to the metadata store [#15377](https://github.com/apache/druid/pull/15377) + +### SQL-based ingestion -#### SQL-based ingestion +#### Added `castToType` parameter -##### Other SQL-based ingestion improvements +Added optional `castToType` parameter to `auto` column schema. -#### Streaming ingestion +[#15417](https://github.com/apache/druid/pull/15417) -##### Other streaming ingestion improvements +#### Improved the EXTEND operator + +You can now use types like `VARCHAR ARRAY` and `BIGINT ARRAY` with the EXTEND operator. + +For example: + +```sql +EXTEND (a VARCHAR ARRAY, b BIGINT ARRAY, c VARCHAR) +``` + +specifies an extern input with native druid input types `ARRAY<STRING>`, `ARRAY<LONG>` and `STRING`. + +[#15458](https://github.com/apache/druid/pull/15458) + +#### Improved tombstone generation to honor granularity specified in a `REPLACE` query + +MSQ `REPLACE` queries now generate tombstone segments honoring the segment granularity specified in the query, rather than generating irregular tombstones. If a query generates more than 5000 tombstones, Druid returns an MSQ `TooManyBucketsFault` error, similar to the behavior with data segments. + +[#15243](https://github.com/apache/druid/pull/15243) + +#### Improved hash joins using filters + +Improved consistency of JOIN behavior for queries using either the native or MSQ engine to prune based on base (left-hand side) columns only. + +[#15299](https://github.com/apache/druid/pull/15299) + +#### Configurable page size limit + +You can now limit the pages size for results of SELECT queries run using the MSQ engine. See `rowsPerPage` in the [SQL-based ingestion reference](https://druid.apache.org/docs/latest/multi-stage-query/reference). + +### Streaming ingestion + +#### Improved Amazon Kinesis automatic reset + +Changed Amazon Kinesis automatic reset behavior to only reset the checkpoints for partitions where sequence numbers are unavailable. + +[#15338](https://github.com/apache/druid/pull/15338) ### Querying -#### Other querying improvements +#### Added IPv6_MATCH SQL function + +Added IPv6_MATCH SQL function for matching IPv6 addresses in a subnet: + +```sql +IPV6_MATCH(address, subnet) +``` + +[#15212](https://github.com/apache/druid/pull/15212/) + +#### Added JSON_QUERY_ARRAY function + +Added JSON_QUERY_ARRAY which is similar to JSON_QUERY except the return type is always `ARRAY<COMPLEX<json>>` instead of `COMPLEX<json>`. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations. + +[#15521](https://github.com/apache/druid/pull/15521) -### Cluster management +#### Added support for numeric support for EARLIEST and LATEST functions -#### Other cluster management improvements +In addition to string support, the following functions can now return numeric values: + +* EARLIEST and EARLIEST_BY +* LATEST and LATEST_BY + +You can also use these functions as aggregations at ingestion time. + +[#15607](https://github.com/apache/druid/pull/15607) + +#### Added support for `aggregateMultipleValues` + +Improved the `ANY_VALUE(expr)` function to support the boolean option `aggregateMultipleValues`. The `aggregateMultipleValues` option is enabled by default. When you run ANY_VALUE on an MVD, the function returns the stringified array. If `aggregateMultipleValues` is set to `false`, ANY_VALUE returns the first value instead. + +[#15434](https://github.com/apache/druid/pull/15434) + +#### Added native `array contains element` filter + +Added native `array contains element` filter to improve performance when using ARRAY_CONTAINS on array columns. + +[#15366](https://github.com/apache/druid/pull/15366) [#15455](https://github.com/apache/druid/pull/15455) + +#### Changed `equals` filter for native queries + +The [equality filter](https://druid.apache.org/docs/latest/querying/filters#equality-filter) on mixed type `auto` columns that contain arrays must now be filtered as their presenting type. This means that if any rows are arrays (for example, the segment metadata and `information_schema` reports the type as some array type), then the native queries must also filter as if they are some array type. + +This change impacts mixed type `auto` columns that contain both scalars and arrays. It doesn't impact SQL, which already has this limitation due to how the type presents itself. + +[#15503](https://github.com/apache/druid/pull/15503) + +#### Improved `timestamp_extract` function + +The `timestamp_extract(expr, unit, [timezone])` Druid native query function now supports dynamic values. + +[#15586](https://github.com/apache/druid/pull/15586) + +#### Improved JSON_VALUE and JSON_QUERY + +Added support for using expressions to compute the JSON path argument for JSON_VALUE and JSON_QUERY functions. + +[#15320](https://github.com/apache/druid/pull/15320) + +#### Improved `ExpressionPostAggregator` array handling + +Improved the use of `ExpressionPostAggregator` to handle ARRAY types output by the grouping engine. The native expression system now recognizes `ComparableStringArray` and `ComparableList` array wrapper types and treats them as ARRAY types. + +Updated `FunctionalExpr` to streamline the handling of class cast exceptions as user errors, so that users are provided with clear exception messages. + +[#15543](https://github.com/apache/druid/pull/15543) + +#### Improved lookups + +Enhanced lookups as follows: + +* Improved loading and dropping of containers for lookups to reduce inconsistencies during updates [#14806](https://github.com/apache/druid/pull/14806) +* Changed behavior for initialization of lookups to load the first lookup as is, regardless of cache status [#15598](https://github.com/apache/druid/pull/15598) + +#### Enabled query request queuing by default when total laning is turned on Review Comment: Yes, I think it should. Added to "Upgrade notes" and upgrade-notes.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
