Re: [PR] Druid 30.0.0 release notes (druid)

via GitHub Wed, 15 May 2024 11:32:04 -0700


ektravel commented on code in PR #16412:
URL: https://github.com/apache/druid/pull/16412#discussion_r1602084392



##########
docs/release-info/release-notes.md:
##########
@@ -57,50 +57,745 @@ For tips about how to write a good release note, see 
[Release notes](https://git
 
 This section contains important information about new and existing features.
 
+### Improved native queries
+
+Native queries can now group on nested columns and arrays.
+
+[#16068](https://github.com/apache/druid/pull/16068)
+
+Before realtime segments are pushed to deep storage, they consist of spill 
files.
+Segment metrics such as `query/segment/time` now report on per spill file for 
a realtime segment, rather than for the entire segment.
+This change eliminates the need to materialize results on the heap, which 
improves the performance of groupBy queries.
+
+[#15757](https://github.com/apache/druid/pull/15757)
+
+### Concurrent append and replace improvements
+
+Improved concurrent replace to work with supervisors using concurrent locks.
+
+[#15995](https://github.com/apache/druid/pull/15995)
+
+You can now grant locks with different types (EXCLUSIVE, SHARED, APPEND, 
REPLACE) for the same interval within a task group to ensure a transition to a 
newer set of tasks without failure.
+Previously, changing lock types in the Supervisor could lead to segment 
allocation errors due to lock conflicts for the new tasks when the older tasks 
are still running.
+
+[#16369](https://github.com/apache/druid/pull/16369)
+
+### Improved AND filter performance
+
+Druid query processing now adaptively determines when children of AND filters 
should compute indexes and when to simply match rows during the scan based on 
selectivity of other filters.
+Known as filter partitioning, it can result in dramatic performance increases, 
depending on the order of filters in the query.
+
+For example, take a query like `SELECT SUM(longColumn) FROM druid.table WHERE 
stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'`. Previously, Druid used 
indexes when processing filters if they are available.
+That's not always ideal; imagine if `stringColumn1 = '1000'` matches 100 rows. 
With indexes, we have to find every value of `stringColumn2 LIKE '%1%'` that is 
true to compute the indexes for the filter. If `stringColumn2` has more than 
100 values, it ends up being worse than simply checking for a match in those 
100 remaining rows.
+
+With the new logic, Druid now checks the selectivity of indexes as it 
processes each clause of the AND filter.
+If it determines it would take more work to compute the index than to match 
the remaining rows, Druid skips computing the index.
+
+The order you write filters in a WHERE clause of a query can improve the 
performance of your query.
+More improvements are coming, but you can try out the existing improvements by 
reordering a query.
+Put indexes that are less intensive to compute such as `IS NULL`, `=`, and 
comparisons (`>`, `>=,` `<`, and `<=`) near the start of AND filters so that 
Druid more efficiently processes your queries.
+Not ordering your filters in this way won’t degrade performance from previous 
releases since the fallback behavior is what Druid did previously.
+
+[#15838](https://github.com/apache/druid/pull/15838)
+
+### Centralized datasource schema (alpha)
+
+You can now configure Druid to manage datasource schema centrally on the 
Coordinator.
+Previously, Brokers needed to query data nodes and tasks for segment schemas.
+Centralizing datasource schemas can improve startup time for Brokers and the 
efficiency of your deployment.
+
+If enabled, the following changes occur:
+
+- Realtime segment schema changes get periodically pushed to the Coordinator
+- Tasks publish segment schemas and metadata to the metadata store
+- The Coordinator polls the schema and segment metadata to build datasource 
schemas
+- Brokers fetch datasource schemas from the Coordinator when possible. If not, 
the Broker builds the schema itself by the existing mechanism of querying 
Historical services.
+
+This behavior is currently opt-in. To enable this feature, set the following 
configs:
+
+- In your common runtime properties, set 
`druid.centralizedDatasourceSchema.enabled` to true.
+- If you are using MiddleManagers, you also need to set 
`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled` to true 
in your MiddleManager runtime properties.
+
+You can return to the previous behavior by changing the configs to false.
+
+You can configure the following properties to control how the Coordinator 
service handles unused segment schemas:
+
+|Name|Description|Required|Default|
+|-|-|-|-|
+|`druid.coordinator.kill.segmentSchema.on`| Boolean value for enabling 
automatic deletion of unused segment schemas. If set to true, the Coordinator 
service periodically identifies segment schemas that are not referenced by any 
used segment and marks them as unused. At a later point, these unused schemas 
are deleted. | No | True|
+|`druid.coordinator.kill.segmentSchema.period`| How often to do automatic 
deletion of segment schemas in [ISO 
8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Value must be 
equal to or greater than 
`druid.coordinator.period.metadataStoreManagementPeriod`. Only applies if 
`druid.coordinator.kill.segmentSchema.on` is set to true.| No| `P1D`|
+|`druid.coordinator.kill.segmentSchema.durationToRetain`| [ISO 
8601](https://en.wikipedia.org/wiki/ISO_8601) duration for the time a segment 
schema is retained for from when it's marked as unused. Only applies if 
`druid.coordinator.kill.segmentSchema.on` is set to true.| Yes, if 
`druid.coordinator.kill.segmentSchema.on` is set to true.| `P90D`|
+
+In addition, there are new metrics available to monitor the performance of 
centralized schema management:
+
+- `metadatacache/schemaPoll/count`
+- `metadatacache/schemaPoll/failed`
+- `metadatacache/schemaPoll/time`
+- `metadacache/init/time`
+- `metadatacache/refresh/count`
+- `metadatacache/refresh/time`
+- `metadatacache/backfill/count`
+- `metadatacache/finalizedSegmentMetadata/size`
+- `metadatacache/finalizedSegmentMetadata/count`
+- `metadatacache/finalizedSchemaPayload/count`
+- `metadatacache/temporaryMetadataQueryResults/count`
+- `metadatacache/temporaryPublishedMetadataQueryResults/count`
+
+For more information, see [Metrics](../operations/metrics.md).
+
+[#15817](https://github.com/apache/druid/pull/15817)
+
+Also, note the following changes to the default values of segment schema 
cleanup:
+
+* The default value for `druid.coordinator.kill.segmentSchema.period` has 
changes from `PT1H` to `P1D`.
+* The default value for 
`druid.coordinator.kill.segmentSchema.durationToRetain` has changed from `PR6H` 
to `P90D`.
+
+[#16354](https://github.com/apache/druid/pull/16354)
+
+### MSQ support for window functions
+
+Added support for using window functions with the MSQ task engine as the query 
engine.
+
+[#15470](https://github.com/apache/druid/pull/15470)
+
+### MSQ support for Google Cloud Storage
+
+You can now export MSQ results to a Google Cloud Storage (GCS) path by passing 
the function `google()` as an argument to the `EXTERN` function.
+
+[#16051](https://github.com/apache/druid/pull/16051)
+
+### RabbitMQ extension
+
+A new RabbitMQ extension is available as a community contribution.
+The RabbitMQ extension (`druid-rabbit-indexing-service`) lets you manage the 
creation and lifetime of rabbit indexing tasks. These indexing tasks read 
events from [RabbitMQ](https://www.rabbitmq.com) through [super 
streams](https://www.rabbitmq.com/docs/streams#super-streams).
+
+As super streams allow exactly once delivery with full support for 
partitioning, they are compatible with Druid's modern ingestion algorithm, 
without the downsides of the prior RabbitMQ firehose.
+
+Note that this uses the RabbitMQ streams feature and not a conventional 
exchange. You need to make sure that your messages are in a super stream before 
consumption. For more information, see [RabbitMQ 
documentation](https://www.rabbitmq.com/docs).
+
+[#14137](https://github.com/apache/druid/pull/14137)
+
 ## Functional area and related changes
 
 This section contains detailed release notes separated by areas.
 
 ### Web console

Review Comment:
   Added 16318



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Druid 30.0.0 release notes (druid)

Reply via email to