ektravel commented on code in PR #16412: URL: https://github.com/apache/druid/pull/16412#discussion_r1610208956
########## docs/release-info/release-notes.md: ########## @@ -57,50 +57,730 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. +### Improved native queries + +Native queries can now group on nested columns and arrays. + +[#16068](https://github.com/apache/druid/pull/16068) + +Before realtime segments are pushed to deep storage, they consist of spill files. +Segment metrics such as `query/segment/time` now report on per spill file for a realtime segment, rather than for the entire segment. +This change eliminates the need to materialize results on the heap, which improves the performance of groupBy queries. + +[#15757](https://github.com/apache/druid/pull/15757) + +### Concurrent append and replace improvements + +Streaming ingestion supervisors now support concurrent append, that is streaming tasks can run concurrently with a replace task (compaction or re-indexing) if it also happens to be using concurrent locks. Set the context parameter `useConcurrentLocks` to true to enable concurrent append. + +Once you update the supervisor to have `"useConcurrentLocks": true`, the transition to concurrent append happens seamlessly without causing any ingestion lag or task failures. + +[#16369](https://github.com/apache/druid/pull/16369) + +Druid now performs active cleanup of stale pending segments by tracking the set of tasks using such pending segments. +This allows concurrent append and replace to upgrade only a minimal set of pending segments and thus improve performance and eliminate errors. +Additionally, it helps in reducing load on the metadata store. + +[#16144](https://github.com/apache/druid/pull/16144) + +### Improved AND filter performance + +Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. +Known as filter partitioning, it can result in dramatic performance increases, depending on the order of filters in the query. + +For example, take a query like `SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'`. Previously, Druid used indexes when processing filters if they are available. +That's not always ideal; imagine if `stringColumn1 = '1000'` matches 100 rows. With indexes, we have to find every value of `stringColumn2 LIKE '%1%'` that is true to compute the indexes for the filter. If `stringColumn2` has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows. + +With the new logic, Druid now checks the selectivity of indexes as it processes each clause of the AND filter. +If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index. + +The order you write filters in a WHERE clause of a query can improve the performance of your query. +More improvements are coming, but you can try out the existing improvements by reordering a query. +Put indexes that are less intensive to compute such as `IS NULL`, `=`, and comparisons (`>`, `>=,` `<`, and `<=`) near the start of AND filters so that Druid more efficiently processes your queries. +Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously. + +[#15838](https://github.com/apache/druid/pull/15838) + +### Centralized datasource schema (alpha) + +You can now configure Druid to manage datasource schema centrally on the Coordinator. +Previously, Brokers needed to query data nodes and tasks for segment schemas. +Centralizing datasource schemas can improve startup time for Brokers and the efficiency of your deployment. + +If enabled, the following changes occur: + +- Realtime segment schema changes get periodically pushed to the Coordinator +- Tasks publish segment schemas and metadata to the metadata store +- The Coordinator polls the schema and segment metadata to build datasource schemas +- Brokers fetch datasource schemas from the Coordinator when possible. If not, the Broker builds the schema itself by the existing mechanism of querying Historical services. + +This behavior is currently opt-in. To enable this feature, set the following configs: + +- In your common runtime properties, set `druid.centralizedDatasourceSchema.enabled` to true. +- If you are using MiddleManagers, you also need to set `druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled` to true in your MiddleManager runtime properties. + +You can return to the previous behavior by changing the configs to false. Review Comment: Removed line 122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
