Re: [PR] Druid 31.0.0 release notes (druid)

via GitHub Tue, 08 Oct 2024 18:11:38 -0700


317brian commented on code in PR #17092:
URL: https://github.com/apache/druid/pull/17092#discussion_r1792534932



##########
docs/release-info/release-notes.md:
##########
@@ -57,46 +57,543 @@ For tips about how to write a good release note, see 
[Release notes](https://git
 
 This section contains important information about new and existing features.
 
-## Functional area and related changes
+### Compaction features
+
+Druid now supports the following features:
+
+- Compaction scheduler with greater flexibility and control over when and what 
to compact.
+- MSQ task engine-based compaction for more performant compaction jobs.
+
+See [Automatic 
compaction](https://druid.apache.org/docs/latest/data-management/automatic-compaction/)
 for details.
+
+Compaction tasks that take advantage of concurrent append and replace is now 
generally available.
+
+[#16291](https://github.com/apache/druid/pull/16291)
+
+### Window functions are GA
+
+[Window 
functions](https://druid.apache.org/docs/latest//querying/sql-window-functions.md)
 are now generally available in Druid's native engine and in the MSQ task 
engine.
+
+- You no longer need to use the query context `enableWindowing` to use window 
functions. [#17087](https://github.com/apache/druid/pull/17087)
+
+### Concurrent append and replace GA
+
+Concurrent append and replace is now GA. The feature safely replaces the 
existing data in an interval of a datasource while new data is being appended 
to that interval. One of the most common applications of this feature is 
appending new data (such as with streaming ingestion) to an interval while 
compaction of that interval is already in progress. 
+
+### Projections (TBC)
+
+### Low latency high complexity queries using DART
+
+Distributed Asynchronous Runtime Topology (DART) supports high complexity 
queries, such as large joins, high cardinality group by, subqueries, and CTEs, 
commonly found in ad-hoc data warehouse workloads. DART uses multi-threaded 
workers, in-memory shuffles, and locally cached data to run these high 
complexity queries with low latency.
+
+DART is fully compatible with current Druid query shapes and Druid's storage 
format. 
+
+[#17140](https://github.com/apache/druid/pull/17140)
+
+### Upgrade-related changes
+
+See the [Upgrade notes](#upgrade-notes) for more information about the 
following upgrade-related changes:
+- [Array ingest mode now defaults to 
array](#array-ingest-mode-now-defaults-to-array)
+- [Disabled ZK-based segment loading](#zk-based-segment-loading)
+- [Removed task action audit logging](#removed-task-action-audit-logging)
+- [Removed Firehose and FirehoseFactory](#removed-firehose-and-firehosefactory)
+- [Removed the scan query legacy mode](#removed-the-scan-query-legacy-mode)
+
+### Deprecations
+
+- Java 8 support is being deprecated and will be removed in 32.0.0.
+- Deprecated API `/lockedIntervals` is now removed 
[#16799](https://github.com/apache/druid/pull/16799)
+- [Cluster-level compaction 
API](#api-for-cluster-level-compaction-configuration) deprecates taskslots 
compaction API [#16803](https://github.com/apache/druid/pull/16803)
+
+## Functional areas and related changes
 
 This section contains detailed release notes separated by areas.
 
 ### Web console
 
+#### Copy query results as SQL
+
+You can now copy the results of a query as a Druid SQL statement: 
+
+![](31.0_copy_results_as_sql.png)
+
+When you copy the results of the pictured query, you get the following query:
+
+```sql
+SELECT
+  CAST("c1" AS VARCHAR) AS "channel",
+  CAST("c2" AS VARCHAR) AS "cityName",
+  DECODE_BASE64_COMPLEX('thetaSketch', "c3") AS "user_theta"
+FROM (
+  VALUES
+  ('ca', NULL, 'AQMDAAA6zJOcUskA1pEMGA=='),
+  ('de', NULL, 'AQMDAAA6zJP43ITYexvoEw=='),
+  ('de', NULL, 'AQMDAAA6zJNtue8WOvrJdA=='),
+  ('en', NULL, 'AQMDAAA6zJMruSUUqmzufg=='),
+  ('en', NULL, 'AQMDAAA6zJM6dC5sW2sTEg=='),
+  ('en', NULL, 'AQMDAAA6zJM6dC5sW2sTEg=='),
+  ('en', NULL, 'AQMDAAA6zJPqjEoIBIGtDw=='),
+  ('en', 'Caulfield', 'AQMDAAA6zJOOtGipKE6KIA=='),
+  ('fa', NULL, 'AQMDAAA6zJM8nxZkoGPlLw=='),
+  ('vi', NULL, 'AQMDAAA6zJMk4ZadSFqHJw==')
+) AS "t" ("c1", "c2", "c3")
+```
+
+[#16458](https://github.com/apache/druid/pull/16458)
+
+#### Explore view improvements
+
+You can now configure the Explore view on top of a source query instead of 
only existing tables.
+You can also point and click to edit the source query, store measures in the 
source query,
+and return to the state of your view using stateful URLs.
+
+[#17180](https://github.com/apache/druid/pull/17180)
+
+Other changes to the Explore view include the following:
+
+- Added the ability to hide all null columns in the record table
+- Added the ability to declare certain parameter values as sticky
+- Added the ability to expand a nested column into is constituent paths
+- Fixed dragging of a VARCHAR column to a measure control
+- Fixed filtering on a predefined measure
+- Fixed drag over indicator not clearing
+- Fixed applying WHERE filter in the grouping table
+- Fixed AS "t" was not always added in the grouping table query
+- Fixed AGGREGATE function not being evaluated if it was in an ORDER BY
+
+[#17213](https://github.com/apache/druid/pull/17213) 
[#17225](https://github.com/apache/druid/pull/17225) 
[#17234](https://github.com/apache/druid/pull/17234)
+
+#### Support Kinesis input format
+
+The web console now supports the Kinesis input format.
+
+[#16850](https://github.com/apache/druid/pull/16850)
+
 #### Other web console improvements
 
+- You can now search for datasources in the **Datasource** view - previously 
you had to find them manually 
[#16371](https://github.com/apache/druid/pull/16371)
+- You can now display both raw and formatted JSON in tables, making the data 
easier to read and troubleshoot 
[#16632](https://github.com/apache/druid/pull/16632)
+- You can now configure the maximum number of tasks through a menu 
[#16991](https://github.com/apache/druid/pull/16991)
+- You can now specify the Delta snapshot version in the web console 
[#17023](https://github.com/apache/druid/pull/17023)
+- Added hooks to customize the workbench view 
[#16749](https://github.com/apache/druid/pull/16749)
+- Added the ability to hide workbench view toolbar in the **Query** view 
[#16785](https://github.com/apache/druid/pull/16785)
+- Added the ability to submit a suspended supervisor using the SQL data loader 
[#16696](https://github.com/apache/druid/pull/16696)
+- Added the ability to configure `serverQueryContext` to set the query context 
[#16868](https://github.com/apache/druid/pull/16868)
+- Added column mapping information to the explain plan 
[#16598](https://github.com/apache/druid/pull/16598)
+- Added the ability to initiate handoff for a supervisor 
[#16586](https://github.com/apache/druid/pull/16586)
+- Added an option to `Use concurrent locks` and moved all insert and replace 
options to a separate submenu 
[#16899](https://github.com/apache/druid/pull/16899)
+- Added Delta snapshot version 
[#17023](https://github.com/apache/druid/pull/17023)
+- Added the Delta tile to the data loader for SQL-based batch and classic 
batch ingestion methods [#17160](https://github.com/apache/druid/pull/17160)
+- Improved how the web console detects durable storage 
[#16493](https://github.com/apache/druid/pull/16493)
+- Made the following web console improvements:
+  - Added titles to action menus
+  - Improved the query-based ingestion counter calculation
+  - Removed the filter clause on `__time`
+  - Fixed scrolling in the `loadRules` editor 
[#16735](https://github.com/apache/druid/pull/16735)
+- Restored the default WHERE filter to auto-generated SQL queries 
[#16608](https://github.com/apache/druid/pull/16608)
+- Fixed NPE due to null values in numeric columns 
[#16760](https://github.com/apache/druid/pull/16760)
+
 ### Ingestion
 
-#### SQL-based ingestion
+#### Optimized the loading of broadcast data sources
+
+Previously all services and tasks downloaded all broadcast data sources.
+To save task storage space and reduce task startup time, this modification 
prevents kill tasks and MSQ controller tasks from downloading unneeded 
broadcast data sources. All other tasks still load all broadcast data sources.
+
+The `CLIPeon` command line option `--loadBroadcastSegments` is deprecated in 
favor of `--loadBroadcastDatasourceMode`.
+
+[#17027](https://github.com/apache/druid/pull/17027)
+
+#### General ingestion improvements
+
+- The default value for `druid.indexer.tasklock.batchAllocationWaitTime` is 
now 0 [#16578](https://github.com/apache/druid/pull/16578)
+- Hadoop-based ingestion now works on Kubernetes deployments 
[#16726](https://github.com/apache/druid/pull/16726)
+- Hadoop-based ingestion now has a Boolean config `useMaxMemoryEstimates` 
parameter, which controls how memory footprint gets estimated. The default is 
false, so that the behavior matches native JSON-based batch ingestion 
[#16280](https://github.com/apache/druid/pull/16280)
+- Added `druid-parquet-extensions` to all example quickstart configurations 
[#16664](https://github.com/apache/druid/pull/16664)
+- Added support for ingesting CSV format data into Kafka records when Kafka 
ingestion is enabled with `ioConfig.type = kafka` 
[#16630](https://github.com/apache/druid/pull/16630)
+- Added logging for sketches on workers 
[#16697](https://github.com/apache/druid/pull/16697)
+- Removed obsolete tasks `index_realtime` and `index_realtime_appenderator` 
tasks&mdash;you can no longer use these tasks to ingest data 
[#16602](https://github.com/apache/druid/pull/16602)
+- Renamed `TaskStorageQueryAdapter` to `TaskQueryTool` and removed the 
`isAudited` method [#16750](https://github.com/apache/druid/pull/16750)
+- Improved Overlord performance by reducing redundant calls in SQL statements 
[#16839](https://github.com/apache/druid/pull/16839)
+- Improved `CustomExceptionMapper` so that it returns a correct failure 
message [#17016](https://github.com/apache/druid/pull/17016)
+- Improved time filtering in subqueries and non-table data sources 
[#17173](https://github.com/apache/druid/pull/17173)
+- Improved `WindowOperatorQueryFrameProcessor` to avoid unnecessary re-runs 
[#17211](https://github.com/apache/druid/pull/17211)
+- Improved memory management by dividing the amount of `partitionStatsMemory` 
by two to account for two simultaneous statistics collectors 
[#17216](https://github.com/apache/druid/pull/17216)
+- Fixed NPE in `CompactSegments` 
[#16713](https://github.com/apache/druid/pull/16713)
+- Fixed Parquet reader to ensure that Druid reads the required columns for a 
filter from the Parquet data files 
[#16874](https://github.com/apache/druid/pull/16874)
+- Fixed a distinct sketches issue where Druid called `retainedKeys.firstKey()` 
twice when adding another sketch 
[#17184](https://github.com/apache/druid/pull/17184)
+- Fixed a `WindowOperatorQueryFrameProcessor` issue where larger queries could 
reach the frame writer's capacity preventing it from outputting all of the 
result rows [#17209](https://github.com/apache/druid/pull/17209)
+- Fixed native ingestion task failures during rolling upgrades from a version 
before Druid 30 [#17219](https://github.com/apache/druid/pull/17219)
+
+### SQL-based ingestion
+
+#### Optimized S3 storage writing for MSQ durable storage
+
+For queries that use the MSQ task engine and write their output to S3 as 
durable storage, uploading chunks of data is now faster.
+
+[#16481](https://github.com/apache/druid/pull/16481)
 
-##### Other SQL-based ingestion improvements
+#### Improved lookup performance
 
-#### Streaming ingestion
+Improved lookup performance for queries that use the MSQ task engine by only 
loading required lookups. This applies to both ingestion and querying.
 
-##### Other streaming ingestion improvements
+[#16358](https://github.com/apache/druid/pull/16358)
+
+#### Other SQL-based ingestion improvements
+
+- Added the ability to use `useConcurrentLocks` in task context to determine 
task lock type [#17193](https://github.com/apache/druid/pull/17193)
+- Reduced memory usage when transferring sketches between the MSQ task engine 
controller and worker [#16269](https://github.com/apache/druid/pull/16269)
+- Improved error handling when retrieving Avro schemas from registry 
[#16684](https://github.com/apache/druid/pull/16684)
+- Fixed issues related to partitioning boundaries in the MSQ task engine's 
window functions [#16729](https://github.com/apache/druid/pull/16729)
+- Fixed a boost column issue causing quantile sketches to incorrectly estimate 
the number of output partitions to create 
[#17141](https://github.com/apache/druid/pull/17141)
+- Fixed an issue with `ScanQueryFrameProcessor` cursor build not adjusting 
intervals [#17168](https://github.com/apache/druid/pull/17168)
+- Improved worker cancellation for the MSQ task engine to prevent race 
conditions [#17046](https://github.com/apache/druid/pull/17046)
+- Improved memory management to better support multi-threaded workers 
[#17057](https://github.com/apache/druid/pull/17057)
+- Reduced memory usage when transferring sketches between the MSQ task engine 
controller and worker [#16269](https://github.com/apache/druid/pull/16269)
+- Improved error handling when retrieving Avro schemas from registry 
[#16684](https://github.com/apache/druid/pull/16684)
+- Fixed issues related to partitioning boundaries in the MSQ task engine's 
window functions [#16729](https://github.com/apache/druid/pull/16729)
+- Fixed handling of null bytes that led to a runtime exception for "Invalid 
value start byte" [#17232](https://github.com/apache/druid/pull/17232)
+- Updated logic to fix incorrect query results for comparisons involving 
arrays [#16780](https://github.com/apache/druid/pull/16780)
+- You can now pass a custom `DimensionSchema` map to MSQ query destination of 
type `DataSourceMSQDestination` instead of using the default values 
[#16864](https://github.com/apache/druid/pull/16864)
+- Fixed the calculation of suggested memory in `WorkerMemoryParameters` to 
account for `maxConcurrentStages` which improves the accuracy of error messages 
[#17108](https://github.com/apache/druid/pull/17108)
+- Optimized the row-based frame writer to reduce failures when writing larger 
single rows to frames [#17094](https://github.com/apache/druid/pull/17094)
+
+### Streaming ingestion
+
+#### New Kinesis input format
+
+Added a Kinesis input format and reader for timestamp and payload parsing.
+The reader relies on a `ByteEntity` type of `KinesisRecordEntity` which 
includes the underlying Kinesis record.
+
+[#16813](https://github.com/apache/druid/pull/16813)
+
+#### Streaming ingestion improvements
+
+- Added a check for handing off upgraded real-time segments. This prevents 
data from being temporarily unavailable for queries during segment handoff 
[#16162](https://github.com/apache/druid/pull/16162)
+- Improved the user experience for autoscaling Kinesis. Switching to 
autoscaling based on max lag per shard from total lag for shard is now 
controlled by the `lagAggregate` config, defaulting to sum 
[#16334](https://github.com/apache/druid/pull/16334)
+- Improved the Supervisor so that it doesn't change to a running state from 
idle if the Overlord restarts 
[#16844](https://github.com/apache/druid/pull/16844)
 
 ### Querying
 
+#### Enabled querying cold datasources
+
+You can now query entirely cold datasources after you enable the 
`CentralizedDatasourceSchema` feature. For information about how to use a 
centralized datasource schema, see [Centralized datasource 
schema](https://druid.apache.org/docs/latest/configuration/#centralized-datasource-schema).
+
+[#16676](https://github.com/apache/druid/pull/16676)
+
+#### SQL DIV function
+
+You can now use the SQL DIV function.
+
+[#16464](https://github.com/apache/druid/pull/16464)
+
+#### Modified equality and typed in filter behavior
+
+Modified the behavior of using `EqualityFilter` and `TypedInFilter` to match 
numeric values (particularly DOUBLE) against string columns by casting strings 
for numerical comparison. This ensures more consistent Druid behavior when the 
`sqlUseBoundAndSelectors` context flag is set.
+
+[#16593](https://github.com/apache/druid/pull/16593)
+
+#### Window query guardrails
+
+Druid blocks window queries that involve aggregation functions inside the 
window clause when the window is included in SELECT.
+The error message provides details on updating your query syntax.
+
+[#16801](https://github.com/apache/druid/pull/16801)
+
+#### Updated query from deep storage API response
+
+Added the following fields from the query-based ingestion task report to the 
response for API request `GET` `/v2/sql/statements/query-id?detail=true`:
+- `stages`: Query stages
+- `counters`: Stage counters
+- `warnings`: Warning reports
+
+[#16808](https://github.com/apache/druid/pull/16808)
+
 #### Other querying improvements
 
+- Improved window queries so that window queries without group by using the 
native engine don't return an empty response 
[#16658](https://github.com/apache/druid/pull/16658)
+- Window queries now support the guardrail `maxSubqueryBytes`  
[#16800](https://github.com/apache/druid/pull/16800)
+- Window functions that use the MSQ task engine now reject MVDs when they're 
used as the PARTITION BY column. Previously, an exception occurred 
[#17036](https://github.com/apache/druid/pull/17036)
+- A query that references aggregators called with unsupported distinct values 
now fails [#16770](https://github.com/apache/druid/pull/16770)
+- Druid now validates that a complex type aligns with the supported types when 
used with an aggregator [#16682](https://github.com/apache/druid/pull/16682)
+- Druid prevents you from using DISTINCT or unsupported aggregations with 
window functions [#16738](https://github.com/apache/druid/pull/16738)
+- Druid now deduces type from aggregators when materializing subquery results 
[#16703](https://github.com/apache/druid/pull/16703)
+- Added the ability to define the segment granularity of a table in the 
catalog [#16680](https://github.com/apache/druid/pull/16680)
+- Added a way for columns to provide `GroupByVectorColumnSelectors`, which 
controls how the groupBy engine operates on them 
[#16338](https://github.com/apache/druid/pull/16338)
+- Added `sqlPlannerBloat` query context parameter to control whether two 
project operators get merged when inlining expressions 
[#16248](https://github.com/apache/druid/pull/16248)
+- Added `enableRACOverWire` query context parameter to enable transfer of RACs 
over wire [#17150](https://github.com/apache/druid/pull/17150)
+- Improved window function offsets for `ArrayListRowsAndColumns` 
[#16718](https://github.com/apache/druid/pull/16718)
+- Improved the fallback strategy when the Broker is unable to materialize the 
subquery's results as frames for estimating the bytes 
[#16679](https://github.com/apache/druid/pull/16679)
+- Improved how Druid executes queries that contain a LIMIT clause 
[#16643](https://github.com/apache/druid/pull/16643)
+- Improved the code style of `NestedDataOperatorConversions` to be consistent 
for each `SqlOperatorConversion` 
[#16695](https://github.com/apache/druid/pull/16695)
+- Improved window functions so that they reject multi-value dimensions during 
processing instead of failing to process them 
[#17002](https://github.com/apache/druid/pull/17002)
+- Improved async query by increasing its timeout to 5 seconds 
[#16656](https://github.com/apache/druid/pull/16656)
+- Improved error message when the requested number of rows in a window exceeds 
the maximum [#16906](https://github.com/apache/druid/pull/16906)
+- Improved numeric aggregations so that Druid now coerces complex types to 
number when possible, such as for `SpectatorHistogram` 
[#16564](https://github.com/apache/druid/pull/16564)
+- Improved query filtering to correctly process cases where both an IN 
expression and an equality (`=`) filter are applied to the same string value 
[#16597](https://github.com/apache/druid/pull/16597)
+- Improved the speed of SQL IN queries that use the SCALAR_IN_ARRAY function 
[#16388](https://github.com/apache/druid/pull/16388)
+- Improved the ARRAY_TO_MV function to handle cases where an object selector 
encounters a multi-value string 
[#17162](https://github.com/apache/druid/pull/17162)
+- Updated the deserialization of dimensions in GROUP BY queries to operate on 
all dimensions at once rather than deserializing individual dimensions 
[#16740](https://github.com/apache/druid/pull/16740)
+- Fixed an issue that caused `maxSubqueryBytes` to fail when segments had 
missing columns [#16619](https://github.com/apache/druid/pull/16619)
+- Fixed an issue with the array type selector that caused the array 
aggregation over window frame to fail 
[#16653](https://github.com/apache/druid/pull/16653)
+- Fixed support for native window queries without a group by clause 
[#16753](https://github.com/apache/druid/pull/16753)
+- Added a query context parameter to enable trasfers of RAC over wire 
[17150](https://github.com/apache/druid/pull/17150)
+- Window functions now support `maxSubqueryBytes` 
[#16800](https://github.com/apache/druid/pull/16800)
+- Fixed an issue with window functions and partitioning 
[#17141](https://github.com/apache/druid/pull/17141)
+- Updated window functions to disallow multi-value dimensions for partitioning 
[#17036](https://github.com/apache/druid/pull/17036)
+- Fixed an issue with casting objects to vector expressions 
[#17148](https://github.com/apache/druid/pull/17148)
+- Added several fixes and improvements to vectorization fallback 
[#17098](https://github.com/apache/druid/pull/17098), 
[#17162](https://github.com/apache/druid/pull/17162)
+- You can now configure encoding method for sketches at query time 
[#17050](https://github.com/apache/druid/pull/17050)

Review Comment:
   fixed. thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Druid 31.0.0 release notes (druid)

Reply via email to