[GitHub] [druid] kfaraz commented on a diff in pull request #13429: 25.0 release notes: batch 3

GitBox Wed, 23 Nov 2022 18:16:43 -0800


kfaraz commented on code in PR #13429:
URL: https://github.com/apache/druid/pull/13429#discussion_r1030998093



##########
docs/next-release-notes.md:
##########
@@ -0,0 +1,496 @@
+# New features
+
+## Updated Kafka support
+
+Updated the Apache Kafka core dependency to version 3.3.1.
+
+https://github.com/apache/druid/pull/13176
+
+## Query engine
+
+### BIG_SUM SQL function
+
+Added SQL function `BIG_SUM` that uses the [Compressed Big 
Decimal](https://github.com/apache/druid/pull/10705) Druid extension.
+
+https://github.com/apache/druid/pull/13102
+
+### Added Compressed Big Decimal min and max functions
+
+Added min and max functions for Compressed Big Decimal and exposed these 
functions via SQL: BIG_MIN and BIG_MAX.
+
+https://github.com/apache/druid/pull/13141
+
+### Metrics used to downsample bucket
+
+Changed the way the MSQ task engine determines whether or not to downsample 
data, to improve accuracy. The task engine now uses the number of bytes instead 
of number of keys.
+
+https://github.com/apache/druid/pull/12998
+
+### MSQ heap footprint
+
+When determining partition boundaries, the heap footprint of the sketches that 
MSQ uses is capped at 10% of available memory or 300 MB, whichever is lower. 
Previously, the cap was strictly 300 MB.
+
+https://github.com/apache/druid/pull/13274
+
+### MSQ Docker improvement
+
+Enabled MSQ task query engine for Docker by default.
+
+https://github.com/apache/druid/pull/13069
+
+### Improved MSQ warnings
+
+For disallowed MSQ warnings of certain types, the warning is now surfaced as 
the error.
+
+https://github.com/apache/druid/pull/13198
+
+### Added support for indexSpec
+
+The MSQ task engine now supports the `indexSpec` context parameter. This 
context parameter can also be configured through the web console.
+
+https://github.com/apache/druid/pull/13275
+
+### Added task start status to the worker report
+
+Added `pendingTasks` and `runningTasks` fields to the worker report for the 
MSQ task engine.
+See [Query task status information](#query-task-status-information) for 
related web console changes.
+
+https://github.com/apache/druid/pull/13263
+
+### Improved handling of secrets
+
+When MSQ submits tasks containing SQL with sensitive keys, the keys can get 
logged in the file.
+Druid now masks the sensitive keys in the log files using regular expressions.
+
+https://github.com/apache/druid/pull/13231
+
+### Use worker number to communicate between tasks
+
+Changed the way WorkerClient communicates between the worker tasks, to 
abstract away the complexity of resolving the `workerNumber` to the `taskId` 
from the callers.
+Once the WorkerClient writes it's outputs to the durable storage, it adds a 
file with `__success` in the `workerNumber` output directory for that stage and 
with its `taskId`. This allows you to determine the worker, which has 
successfully written its outputs to the durable storage, and differentiate from 
the partial outputs by orphan or failed worker tasks.
+
+https://github.com/apache/druid/pull/13062
+
+### Sketch merging mode
+
+When a query requires key statistics to generate partition boundaries, key 
statistics are gathered by the workers while reading rows from the 
datasource.You can now configure whether the MSQ task engine does this task in 
parallel or sequentially. Configure the behavior using 
`clusterStatisticsMergeMode` context parameter. For more information, see 
[Sketch merging 
mode](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#sketch-merging-mode).
+
+https://github.com/apache/druid/pull/13205 
+
+## Querying
+
+### Improvements to querying user experience
+
+This release includes several improvements for querying:
+
+* Exposed HTTP response headers for SQL queries 
(https://github.com/apache/druid/pull/13052)
+* Added the `shouldFinalize` feature for HLL and quantiles sketches. Druid 
will no longer finalize aggregators when:
+    - aggregators appear in the outer level of a query
+    - aggregators are used as input to an expression or 
finalizing-field-access post-aggregator
+
+    To provide backwards compatibility, we added a `sqlFinalizeOuterSketches` 
query context parameter that restores the old behavior 
(https://github.com/apache/druid/pull/13247)
+
+### Enabled async reads for JDBC
+
+Prevented JDBC timeouts on long queries by returning empty batches when a 
batch fetch takes too long. Uses an async model to run the result fetch 
concurrently with JDBC requests.
+
+https://github.com/apache/druid/pull/13196
+
+### Enabled composite approach for checking in-filter values set in column 
dictionary
+
+To accommodate large value sets arising from large in-filters or from joins 
pushed down as in-filters, Druid now uses sorted merge algorithm for merging 
the set and dictionary for larger values.
+
+https://github.com/apache/druid/pull/13133
+
+### Added new configuration keys to query context security model
+
+Added the following configuration keys that refine the query context security 
model controlled by `druid.auth.authorizeQueryContextParams`:
+* `druid.auth.unsecuredContextKeys`: The set of query context keys that do not 
require a security check.
+* `druid.auth.securedContextKeys`: The set of query context keys that do 
require a security check.
+
+## Nested columns
+
+### Support for more formats
+
+Druid nested columns and associated JSON transform functions now supports 
Avro, ORC, and Parquet.
+
+https://github.com/apache/druid/pull/13325 
+
+https://github.com/apache/druid/pull/13375 
+
+### Refactored a data source before unnest 
+
+When data requires "flattening" during processing, the operator now takes in 
an array and then flattens the array into N (N=number of elements in the array) 
rows where each row has one of the values from the array.
+
+https://github.com/apache/druid/pull/13085
+
+## Ingestion
+
+### Improved filtering for cloud objects
+
+You can now stop at arbitrary subfolders using glob syntax in the 
`ioConfig.inputSource.filter` field for native batch ingestion from cloud 
storage, such as S3. 
+
+https://github.com/apache/druid/pull/13027
+
+### CLUSTERED BY limit
+
+When using the MSQ task engine to ingest data, there is now a 1,500 column 
limit to the number of columns that can be passed in the CLUSTERED BY clause.
+
+https://github.com/apache/druid/pull/13352
+
+### Async task client for streaming ingestion
+
+You can now use asynchronous communication with indexing tasks by setting 
`chatAsync` to true in the `tuningConfig`. Enabling asynchronous communication 
means that the `chatThreads` property is ignored.
+
+https://github.com/apache/druid/pull/13354 
+
+### Improved control for how Druid reads JSON data for streaming ingestion
+
+You can now better control how Druid reads JSON data for streaming ingestion 
by setting the following fields in the input format specification:
+
+* `assumedNewlineDelimited` to parse lines of JSON independently.
+* `useJsonNodeReader` to retain valid JSON events when parsing multi-line JSON 
events when a parsing exception occurs.
+
+The web console has been updated to include these options.
+
+https://github.com/apache/druid/pull/13089
+
+
+
+### Kafka Consumer improvement
+
+Allowed Kafka Consumer's custom deserializer to be configured after its 
instantiation.
+
+https://github.com/apache/druid/pull/13097
+
+### Kafka supervisor logging
+
+Kafka supervisor logs are now less noisy. The supervisors now log events at 
the DEBUG level instead of INFO. 
+
+https://github.com/apache/druid/pull/13392
+
+### Fixed Overlord leader election
+
+Fixed a problem where Overlord leader election failed due to lock 
reacquisition issues. Druid now fails these tasks and clears all locks so that 
the Overlord leader election isn't blocked.
+
+https://github.com/apache/druid/pull/13172
+
+### Support for inline protobuf descriptor
+
+Added a new `inline` type `protoBytesDecoder` that allows a user to pass 
inline the contents of a Protobuf descriptor file, encoded as a Base64 string.
+
+https://github.com/apache/druid/pull/13192
+
+### Duplicate notices
+
+For streaming ingestion, notices that are the same as one already in queue 
won't be enqueued. This will help reduce notice queue size. 
+
+https://github.com/apache/druid/pull/13334
+
+### When a Kafka stream becomes inactive, prevent Supervisor from creating new 
indexing tasks
+
+Added Idle feature to `SeekableStreamSupervisor` for inactive stream.
+
+https://github.com/apache/druid/pull/13144
+
+### 

Review Comment:
   empty heading?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kfaraz commented on a diff in pull request #13429: 25.0 release notes: batch 3

Reply via email to