ektravel commented on code in PR #15173: URL: https://github.com/apache/druid/pull/15173#discussion_r1362739603
########## docs/do-not-merge.md: ########## @@ -0,0 +1,737 @@ +<!--Intentionally, there's no Apache license so that the GHA fails it. This file is not meant to be merged. +--> + +Apache Druid 28.0.0 contains over $NUMBER_FEATURES new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from $NUMBER_OF_CONTRIBUTORS contributors. + +See the [complete set of changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+) for additional details, including bug fixes. + +Review the [upgrade notes and incompatible changes](#upgrade-notes-and-incompatible-changes) before you upgrade to Druid 28.0.0. + +# Highlights + +<!-- HIGHLIGHTS H2. FOR EACH MAJOR FEATURE FOR THE RELEASE --> + +## Query from deep storage + +Query from deep storage is now generally available. When you query from deep storage, you more data is available for queries without necessarily having to scale your Historical processes to accommodate more data. To take advantage of the potential storage savings, make sure you configure your load rules to not load all your segments onto Historical processes. + +- [Query from deep storage](https://druid.apache.org/docs/latest/querying/query-deep-storage/) +- [Query from deep storage tutorial](https://druid.apache.org/docs/latest/tutorials/tutorial-query-deep-storage/) + +### SQL three-valued logic + +Druid native filters now correctly observe SQL [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL) (`true`, `false`, `unknown`) instead of Druid's classic two-state logic when the following configuration values are set: + +* `druid.generic.useThreeValueLogicForNativeFilters = true` +* `druid.expressions.useStrictBooleans = true` +* `druid.generic.useDefaultValueForNull = false` + +[#15058](https://github.com/apache/druid/pull/15058) + +### Ingest from multiple Kafka topics to a single datasource + +You can now ingest streaming data from multiple Kafka topics to a datasource using a single supervisor. +You configure the topics for the supervisor spec using a regex pattern as the value for `topic` in the IO config. If you add new topics to Kafka that match the regex, Druid automatically starts ingesting from those new topics. + +If you enable multi-topic ingestion for a datasource, downgrading will cause the supervisor to fail. +For more information, see [Stop supervisors that ingest from multiple Kafka topics before downgrading](#stop-supervisors-that-ingest-from-multiple-kafka-topics-before-downgrading). + +[#14424](https://github.com/apache/druid/pull/14424) +[#14865](https://github.com/apache/druid/pull/14865) + +## Strict booleans + +`druid.expressions.useStrictBooleans` is now enabled by default. +Druid now handles booleans strictly using `1` (true) or `0` (false). +Previously, true and false could be represented either as `true` and `false`, respectively, as well as `1` and `0`. + +If you don't explicitly configure this property in `runtime.properties`, clusters now use LONG types for any ingested boolean values, and in the output of boolean functions for transformations and query time operations. + +This change may impact your query results. For more information, see [SQL compatibility in the upgrade notes](#sql-compatibility). + +[#14734](https://github.com/apache/druid/pull/14734) + +## SQL compatible NULL handling + +`druid.generic.useDefaultValueForNull` is now disabled by default. +Druid now differentiates between empty records, such as `' '`, and null records. +Previously, Druid might treat empty records as empty or null. + +This change may impact your query results. For more information, see [SQL compatibility in the upgrade notes](#sql-compatibility). + +[#14792](https://github.com/apache/druid/pull/14792) + +### JSON and auto column indexer + +The `json` type is now equivalent to using `auto` in native JSON-based ingestion dimension specs. Upgrade your ingestion specs to `json` to take advantage of the features and functionality of `auto`, including the following: + +- Type specializations including ARRAY typed columns +- Better support for nested arrays of strings, longs, and doubles +- Smarter index utilization + +`json` type columns created with Druid 28.0.0 are not backwards compatible with Druid versions older than 26.0.0. +If you upgrade from one of these versions, you can continue to write nested columns in a backwards compatible format (version 4). + +For more information, see [Nested column format in the upgrade notes](#nested-column-format). + +[#14955](https://github.com/apache/druid/pull/14955) +[#14456](https://github.com/apache/druid/pull/14456) + +## Reset offsets for a supervisor + +Added a new API endpoint `/druid/indexer/v1/supervisor/:supervisorId/resetOffsets` to reset specific partition offsets for a supervisor without resetting the entire set. +This endpoint clears only the specified offsets in Kafka or sequence numbers in Kinesis, prompting the supervisor to resume data reading. + +[#14772](https://github.com/apache/druid/pull/14772) + +## Removed Hadoop 2 + +Support for Hadoop 2 has been removed. +Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid's built-in ingestion is not possible, you must upgrade your Hadoop infrastructure to 3.x+ before upgrading to Druid 28.0.0. + +[#14763](https://github.com/apache/druid/pull/14763) + +# Additional features and improvements + +## SQL-based ingestion + +### New ingestion mode for arrays + +The MSQ task engine now includes an `arrayIngestMode` that determines how arrays are treated for ingestion. Previously, arrays were ingested as multi-value dimensions (MVDs). They can now be ingested as `ARRAY<STRING>` or numeric arrays instead if the ingest mode is set to `array`. + +The default mode is for MVDs, which is unchanged from previous releases. + +[#15093](https://github.com/apache/druid/pull/15093) + +### UNNEST functionality + +You can now use UNNEST with MSQ queries. This lets you flatten and explode data during batch ingestion. For more information, see the [UNNEST tutorial](https://druid.apache.org/docs/latest/tutorials/tutorial-unnest-arrays/) and [UNNEST documentation](https://druid.apache.org/docs/latest/querying/sql/#unnest). + Review Comment: Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
