ektravel commented on code in PR #15173: URL: https://github.com/apache/druid/pull/15173#discussion_r1373356814
########## docs/do-not-merge.md: ########## @@ -0,0 +1,827 @@ +<!--Intentionally, there's no Apache license so that the GHA fails it. This file is not meant to be merged. +--> + +Apache Druid 28.0.0 contains over $NUMBER_FEATURES new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from $NUMBER_OF_CONTRIBUTORS contributors. + +See the [complete set of changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+) for additional details, including bug fixes. + +Review the [upgrade notes](#upgrade-notes) and [incompatible changes](#incompatible-changes) before you upgrade to Druid 28.0.0. + +# Highlights + +<!-- HIGHLIGHTS H2. FOR EACH MAJOR FEATURE FOR THE RELEASE --> + +## Window functions (experimental) + +You can use [window functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) in Apache Druid to produce values based upon the relationship of one row within a window of rows to the other rows within the same window. A window is a group of related rows within a result set. For example, rows with the same value for a specific dimension. + +Enable window functions in your query with the `enableWindowing: true` context parameter. + +[#15184](https://github.com/apache/druid/pull/15184) + +## Concurrent append and replace (experimental) + +Druid 28.0.0 adds experimental support for concurrent append and replace. +This feature allows you to safely replace the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this is appending new data to an interval while compaction of that interval is already in progress. +For more information, see [Concurrent append and replace](https://druid.apache.org/docs/latest/data-management/automatic-compaction#concurrent-append-and-replace). + +Segment locking will be deprecated and removed in favor of concurrent append and replace that is much simpler in design. With concurrent append and replace, Druid doesn't lock compaction jobs out because of active realtime ingestion. + +### Task locks for append and replace batch ingestion jobs + +Append batch ingestion jobs can now share locks. This allows you to run multiple append batch ingestion jobs against the same time internal. Replace batch ingestion jobs still require an exclusive lock. This means you can run multiple append batch ingestion jobs and one replace batch ingestion job for a given interval. + +[#14407](https://github.com/apache/druid/pull/14407) + +### Streaming ingestion with concurrent replace + +Streaming jobs reading from Kafka and Kinesis with `APPEND` locks can now ingest concurrently with compaction running with `REPLACE` locks. The segment granularity of the streaming job must be equal to or finer than that of the concurrent replace job. + +[#15039](https://github.com/apache/druid/pull/15039) + +## Query from deep storage + +[Query from deep storage](https://druid.apache.org/docs/latest/querying/query-deep-storage/) is no longer an experimental feature. When you query from deep storage, more data is available for queries without having to scale your Historical processes to accommodate more data. To benefit from the space saving that query from deep storage offers, configure your load rules to unload data from your Historical processes. + +### Support for multiple result formats + +Query from deep storage now supports multiple result formats. +Previously, the `/druid/v2/sql/statements/` endpoint only supported results in the `object` format. Now, results can be written in any format specified in the `resultFormat` parameter. +For more information on result parameters supported by the Druid SQL API, see [Responses](https://druid.apache.org/docs/latest/api-reference/sql-api#responses). + +[#14571](https://github.com/apache/druid/pull/14571) + +### Broadened access for queries for deep storage + +Users with the `STATE` permission can interact with status APIs for queries from deep storage. Previously, only the user who submitted the query could use those APIs. This enables the web console to monitor the running status of the queries. Users with the `STATE` permission can access the query results. + +[#14944](https://github.com/apache/druid/pull/14944) + +### Unused segments + +Druid now stops loading and moving segments as soon as they are marked as unused. This prevents Historical processes from spending time on superfluous loads of segments that will be unloaded later. You can mark segments as unused by a drop rule, overshadowing, or by calling [the Data management API](https://druid.apache.org/docs/latest/api-reference/data-management-api). + +[#14644](https://github.com/apache/druid/pull/14644) + +## SQL planner improvements + +Druid uses Apache Calcite for SQL planning and optimization. Starting in Druid 28.0.0, the Calcite version has been upgraded from 1.21 to 1.35. This upgrade brings in many bug fixes in SQL planning from Calcite. As part of the upgrade, the [behavior of type inference for dynamic parameters](#dynamic-parameters) and the recommended [syntax for UNNEST](#new-syntax-for-sql-unnest) have changed. + +### Dynamic parameters + +The behavior of type inference for dynamic parameters has changed. To avoid any type interference issues, explicitly `CAST` all dynamic parameters as a specific data type in SQL queries. For example: + +```sql +SELECT (1 * CAST (? as DOUBLE))/2 as tmp +``` + +For more information, see [Dynamic parameters in the upgrade notes](#dynamic-parameters-1). + +### New syntax for SQL UNNEST + +The recommended syntax for SQL UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence. For example: + +```sql +SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ... +``` + +For more information, see [UNNEST syntax in the upgrade notes](#unnest-syntax). + +## Ingest from multiple Kafka topics to a single datasource + +You can now ingest streaming data from multiple Kafka topics to a datasource using a single supervisor. +You configure the topics for the supervisor spec using a regex pattern as the value for `topicPattern` in the IO config. If you add new topics to Kafka that match the regex, Druid automatically starts ingesting from those new topics. + +If you enable multi-topic ingestion for a datasource, downgrading will cause the supervisor to fail. +For more information, see [Stop supervisors that ingest from multiple Kafka topics before downgrading](#stop-supervisors-that-ingest-from-multiple-kafka-topics-before-downgrading). + +[#14424](https://github.com/apache/druid/pull/14424) +[#14865](https://github.com/apache/druid/pull/14865) + +## Hadoop 2 removed + +Support for Hadoop 2 has been removed. +Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid's built-in ingestion is not possible, you must upgrade your Hadoop infrastructure to 3.x+ before upgrading to Druid 28.0.0. + +[#14763](https://github.com/apache/druid/pull/14763) + +## JSON and auto column indexer + +The `json` type is now equivalent to using `auto` in native JSON-based ingestion dimension specs. Upgrade your ingestion specs to `json` to take advantage of the features and functionality of `auto`, including the following: + +- Type specializations including ARRAY typed columns +- Better support for nested arrays of strings, longs, and doubles +- Smarter index utilization + +`json` type columns created with Druid 28.0.0 are not backwards compatible with Druid versions older than 26.0.0. +If you upgrade from one of these versions, you can continue to write nested columns in a backwards compatible format (version 4). + +For more information, see [Nested column format in the upgrade notes](#nested-column-format). + +[#14955](https://github.com/apache/druid/pull/14955) +[#14456](https://github.com/apache/druid/pull/14456) + +# Additional features and improvements + +## SQL compatibility + +Druid continues to make SQL query execution more consistent with how standard SQL behaves. However, there are feature flags available to restore the old behaviour if needed. + +### Three-valued logic + +Druid native filters now correctly observe SQL [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL), `true`, `false`, `unknown`, instead of Druid's classic two-state logic when you set the following configuration values: + +* `druid.generic.useThreeValueLogicForNativeFilters = true` +* `druid.expressions.useStrictBooleans = true` +* `druid.generic.useDefaultValueForNull = false` + +[#15058](https://github.com/apache/druid/pull/15058) + +### Strict booleans + +`druid.expressions.useStrictBooleans` is now enabled by default. +Druid now handles booleans strictly using `1` (true) or `0` (false). +Previously, true and false could be represented either as `true` and `false`, respectively, as well as `1` and `0`. + +If you don't explicitly configure this property in `runtime.properties`, clusters now use LONG types for any ingested boolean values, and in the output of boolean functions for transformations and query time operations. + +This change may impact your query results. For more information, see [SQL compatibility in the upgrade notes](#sql-compatibility-1). + +[#14734](https://github.com/apache/druid/pull/14734) + +### NULL handling + +`druid.generic.useDefaultValueForNull` is now disabled by default. +Druid now differentiates between empty records, such as `' '`, and null records. +Previously, Druid might treat empty records as empty or null. + +This change may impact your query results. For more information, see [SQL compatibility in the upgrade notes](#sql-compatibility-1). + +[#14792](https://github.com/apache/druid/pull/14792) + +## SQL-based ingestion + +### Ability to ingest ARRAY types + +SQL-based ingestion now supports storing ARRAY typed values in [ARRAY typed columns](https://druid.apache.org/docs/latest/querying/arrays) as well as storing both VARCHAR and numeric typed arrays. +Previously, the MSQ task engine incorrectly stored ARRAY typed values as [multi-value dimensions](https://druid.apache.org/docs/latest/querying/multi-value-dimensions) instead of ARRAY typed columns. + +The MSQ task engine now includes the `arrayIngestMode` query context parameter, which controls how +`ARRAY` types are stored in Druid segments. +Set the `arrayIngestMode` query context parameter to `array` to ingest ARRAY types. + +In Druid 28.0.0, the default mode for `arrayIngestMode` is `mvd` for backwards compatibility, which only supports VARCHAR typed arrays and stores them as multi-value dimensions. This default is subject to removal in future releases. Review Comment: Good point. Updated the sentence to read "This default is subject to change in future releases." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
