clintropolis commented on code in PR #17092: URL: https://github.com/apache/druid/pull/17092#discussion_r1797203769
########## docs/release-info/release-notes.md: ########## @@ -57,46 +57,570 @@ For tips about how to write a good release note, see [Release notes](https://git This section contains important information about new and existing features. -## Functional area and related changes +### Compaction features + +Druid now supports the following features: + +- Compaction scheduler with greater flexibility and control over when and what to compact. +- MSQ task engine-based auto-compaction for more performant compaction jobs. + +For more information, see [Compaction supervisors](#compaction-supervisors-experimental). + +[#16291](https://github.com/apache/druid/pull/16291) + +Additionally, compaction tasks that take advantage of concurrent append and replace is now generally available as part of concurrent append and replace becoming GA. + +### Window functions are GA + +[Window functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) are now generally available in Druid's native engine and in the MSQ task engine. + +- You no longer need to use the query context `enableWindowing` to use window functions. [#17087](https://github.com/apache/druid/pull/17087) + +### Concurrent append and replace GA + +Concurrent append and replace is now GA. The feature safely replaces the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this feature is appending new data (such as with streaming ingestion) to an interval while compaction of that interval is already in progress. + +### Delta Lake improvements + +The community extension for Delta Lake has been improved to support [complex types](#delta-lake-complex-types) and [snapshot versions](#delta-lake-snapshot-versions). + +### Iceberg improvements + +The community extension for Iceberg has been improved. For more information, see [Iceberg improvements](#iceberg-improvements) + +### Projections (experimental) + +Druid 31.0.0 includes experimental support for projections in segments. Like materialized views, projections can improve the performance of queries by optimizing the route the query takes when it executes. Review Comment: ok, i gave this a shot, also included some instruction on how to use the feature since it isn't documented yet >Druid 31.0.0 includes experimental support for new feature called projections. Projections are grouped pre-aggregates of a segment that are automatically used at query time to optimize execution for any queries which 'fit' the shape of the projection by reducing both computation and i/o cost by reducing the number of rows which need to be processed. Projections are contained within segments of a datasource, and do increase the segment size, but are also able to share data such as value dictionaries of dictionary encoded columns with the columns of the base segment. >As an experimental feature, projections are not well documented yet, but can be defined for streaming ingestion and 'classic' batch ingestion as part of the `dataSchema`. For example, using the standard wikipedia example: ``` "dataSchema": { "granularitySpec": { ... }, "dataSource": ..., "timestampSpec": { ... }, "dimensionsSpec": { ... }, "projections": [ { "type": "aggregate", "name": "channel_page_hourly_distinct_user_added_deleted", "groupingColumns": [ { "type": "long", "name": "__gran" }, { "type": "string", "name": "channel" }, { "type": "string", "name": "page" } ], "virtualColumns": [ { "type": "expression", "expression": "timestamp_floor(__time, 'PT1H')", "name": "__gran", "outputType": "LONG" } ], "aggregators": [ { "type": "HLLSketchBuild", "name": "distinct_users", "fieldName": "user", "round": true }, { "type": "longSum", "name": "sum_added", "fieldName": "added" }, { "type": "longSum", "name": "sum_deleted", "fieldName": "deleted" } ] }, ... ] }, ... ``` >The `groupingColumns` define the order which data is sorted in the projection. Instead of explicitly defining granularity like for the base table, it is defined by defining a virtual column; during ingestion the processing logic finds the ‘finest’ granularity virtual column that is a `timestamp_floor` expression and uses it as the `__time` column for the projection. Projections do not need to have a time column defined, in which case they can still match queries that are not grouping on time. >Projections only can currently be defined by classic ingestion, but they can still be used by queries using MSQ or the new Dart engine. Future development will allow projections to be created as part of MSQ based ingestion as well. >There are a few new query context flags which have been added to aid in experimentation with projections. * `useProjection` accepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the query * `forceProjections` accepts `true` or `false` and instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projection * `noProjections` accpets `true` or `false` and instructs the query engines to not use any projections >We have a lot of plans to continue to improve this feature in the coming releases, but are excited to get it out there so users can begin experimentation since projections can dramatically improve query performance. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
