Re: [PR] DO NOT MERGE - 28.0.0 WIP release notes (druid)

via GitHub Wed, 01 Nov 2023 10:39:40 -0700


ektravel commented on code in PR #15173:
URL: https://github.com/apache/druid/pull/15173#discussion_r1379111320



##########
docs/do-not-merge.md:
##########
@@ -0,0 +1,815 @@
+<!--Intentionally, there's no Apache license so that the GHA fails it. This 
file is not meant to be merged.
+-->
+
+Apache Druid 28.0.0 contains over $NUMBER_FEATURES new features, bug fixes, 
performance enhancements, documentation improvements, and additional test 
coverage from $NUMBER_OF_CONTRIBUTORS contributors.
+
+See the [complete set of 
changes](https://github.com/apache/druid/issues?q=is%3Aclosed+milestone%3A28.0+sort%3Aupdated-desc+)
 for additional details, including bug fixes.
+
+Review the [upgrade notes](#upgrade-notes) and [incompatible 
changes](#incompatible-changes) before you upgrade to Druid 28.0.0.
+
+# Important changes and deprecations
+
+In Druid 28.0.0, we have made substantial improvements to querying to make the 
system more ANSI SQL compatible. This includes changes in handling NULL and 
boolean values as well as boolean logic. At the same time, the Apache Calcite 
library has been upgraded to the latest version. While we have documented known 
query behavior changes, please read the [upgrade notes](#upgrade-notes) section 
carefully. Test your application before rolling out to broad production 
scenarios while closely monitoring the query status.
+
+## SQL compatibility
+
+Druid continues to make SQL query execution more consistent with how standard 
SQL behaves. However, there are feature flags available to restore the old 
behavior if needed.
+
+### Three-valued logic
+
+Druid native filters now observe SQL [three-valued 
logic](https://en.wikipedia.org/wiki/Three-valued_logic#SQL) (`true`, `false`, 
or `unknown`) instead of Druid's classic two-state logic when you set the 
following configuration values:
+
+* `druid.generic.useThreeValueLogicForNativeFilters = true`
+* `druid.expressions.useStrictBooleans = true`
+* `druid.generic.useDefaultValueForNull = false`
+
+[#15058](https://github.com/apache/druid/pull/15058)
+
+### Strict booleans
+
+`druid.expressions.useStrictBooleans` is now enabled by default.
+Druid now handles booleans strictly using `1` (true) or `0` (false).
+Previously, true and false could be represented either as `true` and `false` 
as well as `1` and `0`, respectively.
+In addition, Druid now returns a null value for Boolean comparisons like `True 
&& NULL`.
+
+If you don't explicitly configure this property in `runtime.properties`, 
clusters now use LONG types for any ingested boolean values and in the output 
of boolean functions for transformations and query time operations.
+
+This change may impact your query results. For more information, see [SQL 
compatibility in the upgrade notes](#sql-compatibility-1).
+
+[#14734](https://github.com/apache/druid/pull/14734)
+
+### NULL handling
+
+`druid.generic.useDefaultValueForNull` is now disabled by default.
+Druid now differentiates between empty records and null records.
+Previously, Druid might treat empty records as empty or null.
+
+This change may impact your query results. For more information, see [SQL 
compatibility in the upgrade notes](#sql-compatibility-1).
+
+[#14792](https://github.com/apache/druid/pull/14792)
+
+## SQL planner improvements
+
+Druid uses Apache Calcite for SQL planning and optimization. Starting in Druid 
28.0.0, the Calcite version has been upgraded from 1.21 to 1.35. This upgrade 
brings in many bug fixes in SQL planning from Calcite. As part of the upgrade, 
the behavior of type inference for [dynamic parameters](#dynamic-parameters) 
and the [recommended syntax for UNNEST](#new-syntax-for-sql-unnest) have 
changed.
+
+### Dynamic parameters
+
+The behavior of type inference for dynamic parameters has changed. To avoid 
any type interference issues, explicitly `CAST` all dynamic parameters as a 
specific data type in SQL queries. For example, use:
+
+```sql
+SELECT (1 * CAST (? as DOUBLE))/2 as tmp
+```
+
+Do not use:
+
+```sql
+SELECT (1 * ?)/2 as tmp
+```
+
+### New syntax for SQL UNNEST
+
+The recommended syntax for SQL UNNEST has changed. We recommend using CROSS 
JOIN instead of commas for most queries to prevent issues with precedence. For 
example, use:
+
+```sql
+SELECT column_alias_name1 FROM datasource CROSS JOIN 
UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN 
UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
+```
+
+Do not use:
+
+```sql
+SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS 
table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS 
table_alias_name2(column_alias_name2), ...
+```
+
+## Async query and query from deep storage
+
+[Query from deep 
storage](https://druid.apache.org/docs/latest/querying/query-deep-storage/) is 
no longer an experimental feature. When you query from deep storage, more data 
is available for queries without having to scale your Historical services to 
accommodate more data. To benefit from the space saving that query from deep 
storage offers, configure your load rules to unload data from your Historical 
services.
+
+## MSQ queries for realtime tasks
+
+The MSQ task engine can now include real time segments in query results. To do 
this, use the `includeSegmentSource` context parameter and set it to `REALTIME`.
+
+[#15024](https://github.com/apache/druid/pull/15024)
+
+## MSQ support for UNION ALL queries
+
+You can now use the MSQ task engine to run UNION ALL queries with 
`UnionDataSource`.
+
+[#14981](https://github.com/apache/druid/pull/14981)
+
+## Ingest from multiple Kafka topics to a single datasource
+
+You can now ingest streaming data from multiple Kafka topics to a datasource 
using a single supervisor.
+You configure the topics for the supervisor spec using a regex pattern as the 
value for `topicPattern` in the IO config. If you add new topics to Kafka that 
match the regex, Druid automatically starts ingesting from those new topics.
+
+If you enable multi-topic ingestion for a datasource, downgrading will cause 
the Supervisor to fail.
+For more information, see [Stop supervisors that ingest from multiple Kafka 
topics before 
downgrading](#stop-supervisors-that-ingest-from-multiple-kafka-topics-before-downgrading).
+
+[#14424](https://github.com/apache/druid/pull/14424)
+[#14865](https://github.com/apache/druid/pull/14865)
+
+## SQL UNNEST and ingestion flattening
+
+The UNNEST function is no longer experimental. UNNEST lets you flatten and 
explode data during batch ingestion. For more information, see 
[UNNEST](https://druid.apache.org/docs/latest/querying/sql/#unnest) and [Unnest 
arrays within a 
column](https://druid.apache.org/docs/latest/tutorials/tutorial-unnest-arrays/).
+
+You no longer need to include the context parameter `enableUnnest: true` to 
use UNNEST.
+
+[#14886](https://github.com/apache/druid/pull/14886)
+
+## Window functions (experimental)
+
+You can use [window 
functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) 
in Apache Druid to produce values based upon the relationship of one row within 
a window of rows to the other rows within the same window. A window is a group 
of related rows within a result set. For example, rows with the same value for 
a specific dimension.
+
+Enable window functions in your query with the `enableWindowing: true` context 
parameter.
+
+[#15184](https://github.com/apache/druid/pull/15184)
+
+## Concurrent append and replace (experimental)
+
+Druid 28.0.0 adds experimental support for concurrent append and replace.
+This feature allows you to safely replace the existing data in an interval of 
a datasource while new data is being appended to that interval. One of the most 
common applications of this is appending new data to an interval while 
compaction of that interval is already in progress.
+For more information, see [Concurrent append and 
replace](https://druid.apache.org/docs/latest/data-management/automatic-compaction#concurrent-append-and-replace).
+
+Segment locking will be deprecated and removed in favor of concurrent append 
and replace that is much simpler in design. With concurrent append and replace, 
Druid doesn't lock compaction jobs out because of active realtime ingestion.
+
+# Functional area and related changes
+
+## Web console
+
+### Added UI support for segment loading query context parameter
+
+The web console supports the `waitUntilSegmentsLoad` query context parameter.
+
+![UI for waitUntilSegmentsLoad context parameter](image.png)
+
+[#15110](https://github.com/apache/druid/pull/15110)
+
+### Added concurrent append and replace switches
+
+The web console includes concurrent append and replace switches.
+
+The following screenshot shows the concurrent append and replace switches in 
the classic batch ingestion wizard:
+![Classic batch ingestion wizard](image-1.png)
+
+The following screenshot shows the concurrent append and replace switches in 
the compaction configuration UI:
+![Compaction configuration UI](image-2.png)
+
+[#15114](https://github.com/apache/druid/pull/15114)
+
+### Added UI support for ingesting from multiple Kafka topics to a single 
datasource
+
+The web console supports ingesting streaming data from multiple Kafka topics 
to a datasource using a single supervisor.
+
+![UI for Kafka multi-topic ingestion](image-3.png)
+
+[#14833](https://github.com/apache/druid/pull/14833)
+
+### Other web console improvements
+
+* You can now copy query results from the web console directly to the 
clipboard [#14889](https://github.com/apache/druid/pull/14889)
+* The web console now shows the execution dialog for `query_controller` tasks 
in the task view instead of the generic raw task details dialog. You can still 
access the raw task details from the ellipsis (...) menu 
[#14930)](https://github.com/apache/druid/pull/14930)
+* You can now select a horizontal range in the web console time chart to 
modify the current WHERE clause 
[#14929](https://github.com/apache/druid/pull/14929)
+* You can now set dynamic query parameters in the web console 
[#14921](https://github.com/apache/druid/pull/14921)
+* You can now edit the Coordinator dynamic coonfiguration in the web console 
[#14791](https://github.com/apache/druid/pull/14791)
+* You can now prettify SQL queries and use flatten with a Kafka input format 
[#14906](https://github.com/apache/druid/pull/14906)
+* A warning now appears when a CSV or TSV sample contains newlines that Druid 
does not accept [#14783](https://github.com/apache/druid/pull/14783)
+* You can now select a format when downloading data 
[#14794](https://github.com/apache/druid/pull/14794)
+* Improved the clarity of cluster default rules in the retention dialog 
[#14793](https://github.com/apache/druid/pull/14793)
+* The web console now detects inline queries in the query text and lets you 
run them individually [#14810](https://github.com/apache/druid/pull/14801)
+* You can now reset specific partition offsets for a supervisor 
[#14863](https://github.com/apache/druid/pull/14863)
+
+## Multi-stage query
+
+### Support for multiple result formats
+
+Query from deep storage now supports multiple result formats.
+Previously, the `/druid/v2/sql/statements/` endpoint only supported results in 
the `object` format. Now, results can be written in any format specified in the 
`resultFormat` parameter.
+For more information on result parameters supported by the Druid SQL API, see 
[Responses](https://druid.apache.org/docs/latest/api-reference/sql-api#responses).
+
+[#14571](https://github.com/apache/druid/pull/14571)
+
+### Broadened access for queries from deep storage
+
+Users with the `STATE` permission can interact with status APIs for queries 
from deep storage. Previously, only the user who submitted the query could use 
those APIs. This enables the web console to monitor the running status of the 
queries. Users with the `STATE` permission can access the query results.
+
+[#14944](https://github.com/apache/druid/pull/14944)
+
+## Cluster stability
+
+### Unused segments
+
+Druid now stops loading and moving segments as soon as they are marked as 
unused. This prevents Historical processes from spending time on superfluous 
loads of segments that will be unloaded later. You can mark segments as unused 
by a drop rule, overshadowing, or by calling [the Data management 
API](https://druid.apache.org/docs/latest/api-reference/data-management-api).
+
+[#14644](https://github.com/apache/druid/pull/14644)
+
+## Compaction
+
+### Task locks for append and replace batch ingestion jobs
+
+Append batch ingestion jobs can now share locks. This allows you to run 
multiple append batch ingestion jobs against the same time internal. Replace 
batch ingestion jobs still require an exclusive lock. This means you can run 
multiple append batch ingestion jobs and one replace batch ingestion job for a 
given interval.
+
+[#14407](https://github.com/apache/druid/pull/14407)
+
+### Streaming ingestion with concurrent replace
+
+Streaming jobs reading from Kafka and Kinesis with `APPEND` locks can now 
ingest concurrently with compaction running with `REPLACE` locks. The segment 
granularity of the streaming job must be equal to or finer than that of the 
concurrent replace job.
+
+[#15039](https://github.com/apache/druid/pull/15039)
+
+## Ingestion
+
+### JSON and auto column indexer
+
+The `json` column type is now equivalent to using `auto` in JSON-based batch 
ingestion dimension specs. Upgrade your ingestion specs to `json` to take 
advantage of the features and functionality of `auto`, including the following:
+
+- Type specializations including ARRAY typed columns
+- Better support for nested arrays of strings, longs, and doubles
+- Smarter index utilization
+
+`json` type columns created with Druid 28.0.0 are not backwards compatible 
with Druid versions older than 26.0.0.
+If you upgrade from one of these versions, you can continue to write nested 
columns in a backwards compatible format (version 4).
+
+For more information, see [Nested column format in the upgrade 
notes](#nested-column-format).
+
+[#14955](https://github.com/apache/druid/pull/14955)
+[#14456](https://github.com/apache/druid/pull/14456)
+
+### Ingestion status
+
+Ingestion reports now include a `segmentLoadStatus` object that provides 
information related to the ingestion, such as duration and total segments.
+
+[#14322](https://github.com/apache/druid/pull/14322)
+
+### SQL-based ingestion
+
+#### Ability to ingest ARRAY types
+
+SQL-based ingestion now supports storing ARRAY typed values in [ARRAY typed 
columns](https://druid.apache.org/docs/latest/querying/arrays) as well as 
storing both VARCHAR and numeric typed arrays.
+Previously, the MSQ task engine stored ARRAY typed values as [multi-value 
dimensions](https://druid.apache.org/docs/latest/querying/multi-value-dimensions)
 instead of ARRAY typed columns.
+
+The MSQ task engine now includes the `arrayIngestMode` query context 
parameter, which controls how
+`ARRAY` types are stored in Druid segments.
+Set the `arrayIngestMode` query context parameter to `array` to ingest ARRAY 
types.
+
+In Druid 28.0.0, the default mode for `arrayIngestMode` is `mvd` for backwards 
compatibility, which only supports VARCHAR typed arrays and stores them as 
multi-value dimensions. This default is subject to change in future releases.
+
+Note that this improvement is incompatible with previous Druid versions. For 
information on how to migrate to the new behavior, see the [Ingestion options 
for ARRAY typed columns in the upgrade 
notes](#ingestion-options-for-array-typed-columns).

Review Comment:
   Removed the sentence about incompatibility with previous Druid versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] DO NOT MERGE - 28.0.0 WIP release notes (druid)

Reply via email to