abhishekrb19 commented on code in PR #14600: URL: https://github.com/apache/druid/pull/14600#discussion_r1266235885
########## docs/do-not-merge.md: ########## @@ -0,0 +1,366 @@ +<!--Intentionally, there's no Apache license so that the GHA fails it. This file is not meant to be merged. + +- https://github.com/apache/druid/pull/14266 - we removed input source security from 26 (https://github.com/apache/druid/pull/14003). Should we not include this in 27 release notes? + +--> + +Apache Druid 27.0.0 contains over $NUMBER_FEATURES new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from $NUMBER_OF_CONTRIBUTORS contributors. + +[See the complete set of changes for additional details]($LINK_TO_RELEASE_MILESTONE). + +Review the upgrade notes and incompatible changes before you upgrade to Druid 27.0.0. + +# Highlights + +<!-- HIGHLIGHTS H2. FOR EACH MAJOR FEATURE FOR THE RELEASE --> + +## Query from deep storage + +TBD + +### New statements API + +Added a new API /druid/v2/sql/statements/ which allows users to fetch results in an asynchronous manner. + +[14416](https://github.com/apache/druid/pull/14416) + +### New and updated fields in API to get results + +The API response to get results now returns a `pages` field containing information on each page of the results status. +The `numRows` and `sizeInBytes` fields are renamed to `numTotalRows` and `totalSizeInBytes`, respectively. + +[14512](https://github.com/apache/druid/pull/14512) + + +### Durable storage for results + +MSQ can now write SELECT query results to durable storage. To do so, set the context flag `selectDestination:DURABLE_STORAGE` while issuing SELECT queries to MSQ. + +[14527](https://github.com/apache/druid/pull/14527) + +## Java 17 support + +Druid now fully supports Java 17. +[14384](https://github.com/apache/druid/pull/14384) + +## Array column types + +GA + TBD + +## Schema auto-discovery + +GA + TBD + +## Smart segment loading + +[13197](https://github.com/apache/druid/pull/13197) + +## Hadoop 2 support dropped + +# Additional features and improvements + +## MSQ task engine + +### `maxInputBytesPerWorker` context parameter + +The context parameter now denotes the estimated weighted size (in bytes) of the input to split on. The MSQ task engine now takes into account the input format and compression format instead of the actual file size reported by the file system. + +The default value for the context parameter has also been changed. It is now 512 MiB (previously 10 GiB). + +[14307](https://github.com/apache/druid/pull/14307) + +### Improved query planning behavior + +Druid now fails query planning if a CLUSTERED BY column contains descending order. +Previously, queries would successfully plan if any CLUSTERED BY columns contained descending order. + +The MSQ fault, `InsertCannotOrderByDescending`, is deprecated. An INSERT or REPLACE query containing a CLUSTERED BY expression cannot be in descending order. Druid's segment generation code only supports ascending order. Instead of the fault, Druid now throws a query `ValidationException`. + +[14436](https://github.com/apache/druid/pull/14436) + +### SELECT query results + +SELECT queries executed using MSQ now generate only a subset of the results in the query reports. +To fetch the complete result set, run the query using the native engine. + +[14370](https://github.com/apache/druid/pull/14370) + + +### Other MSQ improvements + +- The same aggregator can now have two output names [14367](https://github.com/apache/druid/pull/14367) +- Changed the default `clusterStatisticsMergeMode` to SEQUENTIAL (#14310)[https://github.com/apache/druid/pull/14310] +- Added a query context parameter `MultiStageQueryContext` to determine whether the result of an MSQ select query is limited (#14476)[https://github.com/apache/druid/pull/14476] +- Enabled using functions as inputs for `index` and `length` parameters (#14480)[https://github.com/apache/druid/pull/14480] + +## Ingestion + +### New property for task completion updates + +The new property `druid.indexer.queue.taskCompleteHandlerNumThreads` controls the number of threads used by the Overlord `TaskQueue` to handle task completion updates received from the workers. + +The following metrics have been added: +* `task/status/queue/count`: Monitors the number of queued items +* `task/status/updated/count`: Monitors the number of processed items + +[14533](https://github.com/apache/druid/pull/14533) + +### Improved response to max_allowed_packet limit + +If the Overlord fails to insert a task into the metadata because of a payload that exceeds the `max_allowed_packet` limit, the response now returns `400 Bad request` instead of `500 Internal server error`. This prevents an `index_parallel` task from retrying the insertion of a bad sub-task indefinitely and causes it to fail immediately. + +[14271](https://github.com/apache/druid/pull/14271) + +### Improved handling of mixed type arrays + +Druid now handles mixed type arrays such as `[["a", "b", "c"], {"x": 123}]` as `ARRAY<COMPLEX<json>` rather than throwing an incompatible type exception. + +[14438](https://github.com/apache/druid/pull/14438) + +### Other ingestion improvements + +* A negative streaming ingestion lag is no longer emitted as a result of stale offsets. [14292](https://github.com/apache/druid/pull/14292) +* Removed double synchronization on simple map operations in Kubernetes task runner. [14435](https://github.com/apache/druid/pull/14435) +* Kubernetes overlord extension now cleans up the job if the task pod fails to come up in time. [14425](https://github.com/apache/druid/pull/14425) + +## Querying + +### New function for regular expression replacement + +The new function `REGEXP_REPLACE` allows you to replace all instances of a pattern with a replacement string. + +[14460](https://github.com/apache/druid/pull/14460) + +### Query results directory + +Druid now supports a `query-results` directory in durable storage to store query results after the task finishes. The auto cleaner does not remove this directory unless the task ID is not known to the Overlord. + +[14446](https://github.com/apache/druid/pull/14446) + +### Limit for subquery results by memory usage + +Users can now add a guardrail to prevent subqueryâs results from exceeding the set number of bytes by setting `druid.server.http.maxSubqueryRows` in the Broker's config or `maxSubqueryRows` in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query. + +[13952](https://github.com/apache/druid/pull/13952) + +### HLL and Theta sketch estimates + +You can now use HLL_SKETCH_ESTIMATE and THETA_SKETCH_ESTIMATE as expressions. These estimates work on sketch columns and have the same behavior as `postAggs`. Review Comment: ```suggestion You can now use `HLL_SKETCH_ESTIMATE` and `THETA_SKETCH_ESTIMATE` as expressions. These estimates work on sketch columns and have the same behavior as `postAggs`. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
