Re: [PR] DO NOT MERGE - 28.0.0 WIP release notes (druid)

via GitHub Tue, 24 Oct 2023 13:54:12 -0700


ektravel commented on code in PR #15173:
URL: https://github.com/apache/druid/pull/15173#discussion_r1370798395



##########
docs/do-not-merge.md:
##########
@@ -0,0 +1,809 @@
+<!--Intentionally, there's no Apache license so that the GHA fails it. This 
file is not meant to be merged.
+-->
+
+Apache Druid 28.0.0 contains over $NUMBER_FEATURES new features, bug fixes, 
performance enhancements, documentation improvements, and additional test 
coverage from $NUMBER_OF_CONTRIBUTORS contributors.
+
+See the [complete set of 
changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+)
 for additional details, including bug fixes.
+
+Review the [upgrade notes](#upgrade-notes) and [incompatible 
changes](#incompatible-changes) before you upgrade to Druid 28.0.0.
+
+# Highlights
+
+<!-- HIGHLIGHTS H2. FOR EACH MAJOR FEATURE FOR THE RELEASE -->
+
+## Window functions (experimental)
+
+You can use [window 
functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) 
in Apache Druid to produce values based upon the relationship of one row within 
a window of rows to the other rows within the same window. A window is a group 
of related rows within a result set. For example, rows with the same value for 
a specific dimension.
+
+Enable window functions in your query with the `enableWindowing: true` context 
parameter.
+
+[#15184](https://github.com/apache/druid/pull/15184)
+
+## Concurrent compaction (experimental)
+
+When you have automatic compaction enabled, you can use concurrent compaction 
to compact data as you ingest it. Concurrent compaction supports streaming 
ingestion and JSON-based batch ingestion.
+
+For more information, see [Concurrent 
compaction](https://druid.apache.org/docs/latest/data-management/automatic-compaction#concurrent-compaction).
+
+### Concurrent batch APPEND and REPLACE
+
+`APPEND` batch ingestion jobs can now share locks. This allows you to run 
multiple `APPEND` batch ingestion jobs against the same time internal. 
`REPLACE` batch ingestion jobs still require an exclusive lock. This means you 
can run multiple `APPEND` batch ingestion jobs and one `REPLACE` batch 
ingestion job for a given interval.
+
+[#14407](https://github.com/apache/druid/pull/14407)
+
+### Streaming ingestion with concurrent REPLACE
+
+Streaming jobs reading from Kafka and Kinesis with `APPEND` locks can now 
ingest concurrently with compaction running with `REPLACE` locks. The segment 
granularity of the streaming job must be equal to or finer than that of the 
concurrent `REPLACE` job.
+
+[#15039](https://github.com/apache/druid/pull/15039)
+
+## Query from deep storage
+
+[Query from deep 
storage](https://druid.apache.org/docs/latest/querying/query-deep-storage/) is 
no longer an experimental feature. When you query from deep storage, more data 
is available for queries without having to scale your Historical processes to 
accommodate more data. To benefit from the space saving that query from deep 
storage offers, configure your load rules to unload data from your Historical 
processes.
+
+### Result formats
+
+Query from deep storage now supports multiple result formats.
+Previously, the `/druid/v2/sql/statements/` endpoint only supported results in 
the `object` format. Now, results can be written in any format specified in the 
`resultFormat` parameter.
+For more information on result parameters supported by the Druid SQL API, see 
[Responses](https://druid.apache.org/docs/latest/api-reference/sql-api#responses).
+
+[#14571](https://github.com/apache/druid/pull/14571)
+
+### Broadened access for queries for deep storage
+
+Users with the `STATE` permission can interact with status APIs for queries 
from deep storage. Previously, only the user who submitted the query could use 
those APIs. This enables the web console to monitor the running status of the 
queries. Users with the `STATE` permission can access the query results.
+
+[#14944](https://github.com/apache/druid/pull/14944)
+
+### Unused segments
+
+Druid now stops loading and moving segments as soon as they are marked as 
unused. This prevents Historical processes from spending time on superfluous 
loads of segments that will be unloaded later. You can mark segments as unused 
by a drop rule, overshadowing, or by calling [the Data management 
API](https://druid.apache.org/docs/latest/api-reference/data-management-api).
+
+[#14644](https://github.com/apache/druid/pull/14644)
+
+## SQL planner upgrade
+
+Druid uses Apache Calcite for SQL planning and optimization. Starting in Druid 
28.0.0, the Calcite version has been upgraded from 1.21 to 1.35. As part of the 
upgrade, the [behavior of type inference for dynamic 
parameters](#dynamic-parameters) and the recommended [syntax for 
UNNEST](#new-syntax-for-sql-unnest) have changed.
+
+### Dynamic parameters
+
+The behavior of type inference for dynamic parameters has changed. To avoid 
any type interference issues, explicitly `CAST` all dynamic parameters as a 
specific data type in SQL queries. For example:
+
+```sql
+SELECT (1 * CAST (? as DOUBLE))/2 as tmp
+```
+
+For more information, see [Dynamic parameters in the upgrade 
notes](#dynamic-parameters-1).
+
+### New syntax for SQL UNNEST
+
+The recommended syntax for SQL UNNEST has changed. We recommend using CROSS 
JOIN instead of commas for most queries to prevent issues with precedence. For 
example:
+
+```sql
+SELECT column_alias_name1 FROM datasource CROSS JOIN 
UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN 
UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
+```
+
+For more information, see [UNNEST syntax in the upgrade notes](#unnest-syntax).
+
+## Ingest from multiple Kafka topics to a single datasource
+
+You can now ingest streaming data from multiple Kafka topics to a datasource 
using a single supervisor.
+You configure the topics for the supervisor spec using a regex pattern as the 
value for `topicPattern` in the IO config. If you add new topics to Kafka that 
match the regex, Druid automatically starts ingesting from those new topics.
+
+If you enable multi-topic ingestion for a datasource, downgrading will cause 
the supervisor to fail.
+For more information, see [Stop supervisors that ingest from multiple Kafka 
topics before 
downgrading](#stop-supervisors-that-ingest-from-multiple-kafka-topics-before-downgrading).
+
+[#14424](https://github.com/apache/druid/pull/14424)
+[#14865](https://github.com/apache/druid/pull/14865)
+
+## Hadoop 2 removed
+
+Support for Hadoop 2 has been removed.
+Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x 
for ingestion today. If migrating to Druid's built-in ingestion is not 
possible, you must upgrade your Hadoop infrastructure to 3.x+ before upgrading 
to Druid 28.0.0.
+
+[#14763](https://github.com/apache/druid/pull/14763)
+
+## Legacy groupBy v1 removed
+
+The groupBy v1 engine has been removed. Use the groupBy v2 engine instead.  
+If you are using groupBy v1 in native queries, you must change your queries 
before upgrading. Otherwise, your queries will fail.
+For more information, see [GroupBy 
queries](https://druid.apache.org/docs/latest/querying/groupbyquery).
+
+[#14866](https://github.com/apache/druid/pull/14866)
+
+## JSON and auto column indexer
+
+The `json` type is now equivalent to using `auto` in native JSON-based 
ingestion dimension specs. Upgrade your ingestion specs to `json` to take 
advantage of the features and functionality of `auto`, including the following:
+
+- Type specializations including ARRAY typed columns
+- Better support for nested arrays of strings, longs, and doubles
+- Smarter index utilization
+
+`json` type columns created with Druid 28.0.0 are not backwards compatible 
with Druid versions older than 26.0.0.
+If you upgrade from one of these versions, you can continue to write nested 
columns in a backwards compatible format (version 4).
+
+For more information, see [Nested column format in the upgrade 
notes](#nested-column-format).
+
+[#14955](https://github.com/apache/druid/pull/14955)
+[#14456](https://github.com/apache/druid/pull/14456)
+
+# Additional features and improvements
+
+## SQL compatibility
+
+### Three-valued logic
+
+Druid native filters now correctly observe SQL [three-valued 
logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL), `true`, `false`, 
`unknown`, instead of Druid's classic two-state logic when you set the 
following configuration values:
+
+* `druid.generic.useThreeValueLogicForNativeFilters = true`
+* `druid.expressions.useStrictBooleans = true`
+* `druid.generic.useDefaultValueForNull = false`
+
+[#15058](https://github.com/apache/druid/pull/15058)
+
+### Strict booleans
+
+`druid.expressions.useStrictBooleans` is now enabled by default.
+Druid now handles booleans strictly using `1` (true) or `0` (false).
+Previously, true and false could be represented either as `true` and `false`, 
respectively, as well as `1` and `0`.
+
+If you don't explicitly configure this property in `runtime.properties`, 
clusters now use LONG types for any ingested boolean values, and in the output 
of boolean functions for transformations and query time operations.
+
+This change may impact your query results. For more information, see [SQL 
compatibility in the upgrade notes](#sql-compatibility-1).
+
+[#14734](https://github.com/apache/druid/pull/14734)
+
+### NULL handling
+
+`druid.generic.useDefaultValueForNull` is now disabled by default.
+Druid now differentiates between empty records, such as `' '`, and null 
records.
+Previously, Druid might treat empty records as empty or null.
+
+This change may impact your query results. For more information, see [SQL 
compatibility in the upgrade notes](#sql-compatibility-1).
+
+[#14792](https://github.com/apache/druid/pull/14792)
+
+## SQL-based ingestion
+
+### Azure connector
+
+Added support for Microsoft's Azure storage type.
+You can now use fault tolerance and durable storage with Microsoft Azure's 
blob storage.
+For more information, see [Durable 
storage](https://druid.apache.org/docs/latest/multi-stage-query/reference#durable-storage).
+
+[#14660](https://github.com/apache/druid/pull/14660)
+
+### New ingestion mode for arrays
+
+The MSQ task engine now includes an `arrayIngestMode` that determines how 
arrays are treated for ingestion. Previously, arrays were ingested as 
multi-value dimensions (MVDs). They can now be ingested as `ARRAY<STRING>` or 
numeric arrays instead if the ingest mode is set to `array`.

Review Comment:
   @LakshSingla If MSQ supported string arrays until Druid 27, what happened in 
Druid 27? Did we stop supporting string arrays in Druid 27?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] DO NOT MERGE - 28.0.0 WIP release notes (druid)

Reply via email to