[GitHub] [druid] kfaraz commented on a diff in pull request #13429: 25.0 release notes: batch 3

GitBox Wed, 23 Nov 2022 18:17:51 -0800


kfaraz commented on code in PR #13429:
URL: https://github.com/apache/druid/pull/13429#discussion_r1030998295



##########
docs/next-release-notes.md:
##########
@@ -0,0 +1,496 @@
+# New features
+
+## Updated Kafka support
+
+Updated the Apache Kafka core dependency to version 3.3.1.
+
+https://github.com/apache/druid/pull/13176
+
+## Query engine
+
+### BIG_SUM SQL function
+
+Added SQL function `BIG_SUM` that uses the [Compressed Big 
Decimal](https://github.com/apache/druid/pull/10705) Druid extension.
+
+https://github.com/apache/druid/pull/13102
+
+### Added Compressed Big Decimal min and max functions
+
+Added min and max functions for Compressed Big Decimal and exposed these 
functions via SQL: BIG_MIN and BIG_MAX.
+
+https://github.com/apache/druid/pull/13141
+
+### Metrics used to downsample bucket
+
+Changed the way the MSQ task engine determines whether or not to downsample 
data, to improve accuracy. The task engine now uses the number of bytes instead 
of number of keys.
+
+https://github.com/apache/druid/pull/12998
+
+### MSQ heap footprint
+
+When determining partition boundaries, the heap footprint of the sketches that 
MSQ uses is capped at 10% of available memory or 300 MB, whichever is lower. 
Previously, the cap was strictly 300 MB.
+
+https://github.com/apache/druid/pull/13274
+
+### MSQ Docker improvement
+
+Enabled MSQ task query engine for Docker by default.
+
+https://github.com/apache/druid/pull/13069
+
+### Improved MSQ warnings
+
+For disallowed MSQ warnings of certain types, the warning is now surfaced as 
the error.
+
+https://github.com/apache/druid/pull/13198
+
+### Added support for indexSpec
+
+The MSQ task engine now supports the `indexSpec` context parameter. This 
context parameter can also be configured through the web console.
+
+https://github.com/apache/druid/pull/13275
+
+### Added task start status to the worker report
+
+Added `pendingTasks` and `runningTasks` fields to the worker report for the 
MSQ task engine.
+See [Query task status information](#query-task-status-information) for 
related web console changes.
+
+https://github.com/apache/druid/pull/13263
+
+### Improved handling of secrets
+
+When MSQ submits tasks containing SQL with sensitive keys, the keys can get 
logged in the file.
+Druid now masks the sensitive keys in the log files using regular expressions.
+
+https://github.com/apache/druid/pull/13231
+
+### Use worker number to communicate between tasks
+
+Changed the way WorkerClient communicates between the worker tasks, to 
abstract away the complexity of resolving the `workerNumber` to the `taskId` 
from the callers.
+Once the WorkerClient writes it's outputs to the durable storage, it adds a 
file with `__success` in the `workerNumber` output directory for that stage and 
with its `taskId`. This allows you to determine the worker, which has 
successfully written its outputs to the durable storage, and differentiate from 
the partial outputs by orphan or failed worker tasks.
+
+https://github.com/apache/druid/pull/13062
+
+### Sketch merging mode
+
+When a query requires key statistics to generate partition boundaries, key 
statistics are gathered by the workers while reading rows from the 
datasource.You can now configure whether the MSQ task engine does this task in 
parallel or sequentially. Configure the behavior using 
`clusterStatisticsMergeMode` context parameter. For more information, see 
[Sketch merging 
mode](https://druid.apache.org/docs/latest/multi-stage-query/reference.html#sketch-merging-mode).
+
+https://github.com/apache/druid/pull/13205 
+
+## Querying
+
+### Improvements to querying user experience
+
+This release includes several improvements for querying:
+
+* Exposed HTTP response headers for SQL queries 
(https://github.com/apache/druid/pull/13052)
+* Added the `shouldFinalize` feature for HLL and quantiles sketches. Druid 
will no longer finalize aggregators when:
+    - aggregators appear in the outer level of a query
+    - aggregators are used as input to an expression or 
finalizing-field-access post-aggregator
+
+    To provide backwards compatibility, we added a `sqlFinalizeOuterSketches` 
query context parameter that restores the old behavior 
(https://github.com/apache/druid/pull/13247)
+
+### Enabled async reads for JDBC
+
+Prevented JDBC timeouts on long queries by returning empty batches when a 
batch fetch takes too long. Uses an async model to run the result fetch 
concurrently with JDBC requests.
+
+https://github.com/apache/druid/pull/13196
+
+### Enabled composite approach for checking in-filter values set in column 
dictionary
+
+To accommodate large value sets arising from large in-filters or from joins 
pushed down as in-filters, Druid now uses sorted merge algorithm for merging 
the set and dictionary for larger values.
+
+https://github.com/apache/druid/pull/13133
+
+### Added new configuration keys to query context security model
+
+Added the following configuration keys that refine the query context security 
model controlled by `druid.auth.authorizeQueryContextParams`:
+* `druid.auth.unsecuredContextKeys`: The set of query context keys that do not 
require a security check.
+* `druid.auth.securedContextKeys`: The set of query context keys that do 
require a security check.
+
+## Nested columns
+
+### Support for more formats
+
+Druid nested columns and associated JSON transform functions now supports 
Avro, ORC, and Parquet.
+
+https://github.com/apache/druid/pull/13325 
+
+https://github.com/apache/druid/pull/13375 
+
+### Refactored a data source before unnest 
+
+When data requires "flattening" during processing, the operator now takes in 
an array and then flattens the array into N (N=number of elements in the array) 
rows where each row has one of the values from the array.
+
+https://github.com/apache/druid/pull/13085
+
+## Ingestion
+
+### Improved filtering for cloud objects
+
+You can now stop at arbitrary subfolders using glob syntax in the 
`ioConfig.inputSource.filter` field for native batch ingestion from cloud 
storage, such as S3. 
+
+https://github.com/apache/druid/pull/13027
+
+### CLUSTERED BY limit
+
+When using the MSQ task engine to ingest data, there is now a 1,500 column 
limit to the number of columns that can be passed in the CLUSTERED BY clause.
+
+https://github.com/apache/druid/pull/13352
+
+### Async task client for streaming ingestion
+
+You can now use asynchronous communication with indexing tasks by setting 
`chatAsync` to true in the `tuningConfig`. Enabling asynchronous communication 
means that the `chatThreads` property is ignored.
+
+https://github.com/apache/druid/pull/13354 
+
+### Improved control for how Druid reads JSON data for streaming ingestion
+
+You can now better control how Druid reads JSON data for streaming ingestion 
by setting the following fields in the input format specification:
+
+* `assumedNewlineDelimited` to parse lines of JSON independently.
+* `useJsonNodeReader` to retain valid JSON events when parsing multi-line JSON 
events when a parsing exception occurs.
+
+The web console has been updated to include these options.
+
+https://github.com/apache/druid/pull/13089
+
+
+
+### Kafka Consumer improvement
+
+Allowed Kafka Consumer's custom deserializer to be configured after its 
instantiation.
+
+https://github.com/apache/druid/pull/13097
+
+### Kafka supervisor logging
+
+Kafka supervisor logs are now less noisy. The supervisors now log events at 
the DEBUG level instead of INFO. 
+
+https://github.com/apache/druid/pull/13392
+
+### Fixed Overlord leader election
+
+Fixed a problem where Overlord leader election failed due to lock 
reacquisition issues. Druid now fails these tasks and clears all locks so that 
the Overlord leader election isn't blocked.
+
+https://github.com/apache/druid/pull/13172
+
+### Support for inline protobuf descriptor
+
+Added a new `inline` type `protoBytesDecoder` that allows a user to pass 
inline the contents of a Protobuf descriptor file, encoded as a Base64 string.
+
+https://github.com/apache/druid/pull/13192
+
+### Duplicate notices
+
+For streaming ingestion, notices that are the same as one already in queue 
won't be enqueued. This will help reduce notice queue size. 
+
+https://github.com/apache/druid/pull/13334
+
+### When a Kafka stream becomes inactive, prevent Supervisor from creating new 
indexing tasks
+
+Added Idle feature to `SeekableStreamSupervisor` for inactive stream.
+
+https://github.com/apache/druid/pull/13144
+
+### 
+
+### Sampling from stream input now respects the configured timeout
+
+Fixed a problem where sampling from a stream input, such as Kafka or Kinesis, 
failed to respect the configured timeout when the stream had no records 
available. You can now set the maximum amount of time in which the entry 
iterator will return results.
+
+https://github.com/apache/druid/pull/13296
+
+### Streaming tasks resume on Overlord switch
+
+Fixed a problem where streaming ingestion tasks continued to run until their 
duration elapsed after the Overlord leader had issued a pause to the tasks. 
Now, when the Overlord switch occurs right after it has issued a pause to the 
task, the task remains in a paused state even after the Overlord re-election.
+
+https://github.com/apache/druid/pull/13223
+
+### Fixed Parquet list conversion
+
+Fixes an issue with Parquet list conversion, where lists of complex objects 
could unexpectedly be wrapped in an extra object, appearing as 
`[{"element":<actual_list_element>},{"element":<another_one>}...]` instead of 
the direct list. This changes the behavior of the parquet reader for lists of 
structured objects to be consistent with other parquet logical list 
conversions. The data is now fetched directly, more closely matching its 
expected structure.
+
+https://github.com/apache/druid/pull/13294
+
+### Introduced a tree type to flattenSpec
+
+Introduced a `tree` type to `flattenSpec`. In the event that a simple 
hierarchical lookup is required, the `tree` type allows for faster JSON parsing 
than `jq` and `path` parsing types.
+
+https://github.com/apache/druid/pull/12177
+
+## Operations
+
+### Compaction
+
+Compaction behavior has changed to improve the amount of time it takes and 
disk space it takes:
+
+- When segments need to be fetched, download them one at a time and delete 
them when Druid is done with them. This still takes time but minimizes the 
required disk space.
+- Don't fetch segments on the main compact task when they aren't needed. If 
the user provides a full `granularitySpec`, `dimensionsSpec`, and 
`metricsSpec`, Druid skips fetching segments.
+
+For more information, see the documentation on 
[Compaction](https://druid.apache.org/docs/latest/data-management/compaction.html)
 and [Automatic 
compaction](https://druid.apache.org/docs/latest/data-management/automatic-compaction.html).
+
+https://github.com/apache/druid/pull/13280
+
+### New metric for segments
+
+`segment/handoff/time` captures the total time taken for handoff for a given 
set of published segments.
+
+https://github.com/apache/druid/pull/13238 
+
+### Idle configs for the Supervisor
+
+You can now configure the following properties:
+
+| Property | Description | Default |
+| - | - | -|
+|`druid.supervisor.idleConfig.enabled`| (Cluster wide) If `true`, supervisor 
can become idle if there is no data on input stream/topic for some time.|false|
+|`druid.supervisor.idleConfig.inactiveAfterMillis`| (Cluster wide) Supervisor 
is marked as idle if all existing data has been read from input topic and no 
new data has been published for `inactiveAfterMillis` milliseconds.|`600_000`|
+| `inactiveAfterMillis` | (Individual Supervisor) Supervisor is marked as idle 
if all existing data has been read from input topic and no new data has been 
published for `inactiveAfterMillis` milliseconds. | no (default == `600_000`) |
+
+https://github.com/apache/druid/pull/13311
+
+### cachingCost balancer strategy

Review Comment:
   This too could be an item inside `segment loading and balancing improvements`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] kfaraz commented on a diff in pull request #13429: 25.0 release notes: batch 3

Reply via email to