Re: [PR] Druid 31.0.0 release notes (druid)

via GitHub Fri, 11 Oct 2024 09:46:42 -0700


clintropolis commented on code in PR #17092:
URL: https://github.com/apache/druid/pull/17092#discussion_r1797203769



##########
docs/release-info/release-notes.md:
##########
@@ -57,46 +57,570 @@ For tips about how to write a good release note, see 
[Release notes](https://git
 
 This section contains important information about new and existing features.
 
-## Functional area and related changes
+### Compaction features
+
+Druid now supports the following features:
+
+- Compaction scheduler with greater flexibility and control over when and what 
to compact.
+- MSQ task engine-based auto-compaction for more performant compaction jobs.
+
+For more information, see [Compaction 
supervisors](#compaction-supervisors-experimental).
+
+[#16291](https://github.com/apache/druid/pull/16291)
+
+Additionally, compaction tasks that take advantage of concurrent append and 
replace is now generally available as part of concurrent append and replace 
becoming GA.
+
+### Window functions are GA
+
+[Window 
functions](https://druid.apache.org/docs/latest/querying/sql-window-functions) 
are now generally available in Druid's native engine and in the MSQ task engine.
+
+- You no longer need to use the query context `enableWindowing` to use window 
functions. [#17087](https://github.com/apache/druid/pull/17087)
+
+### Concurrent append and replace GA
+
+Concurrent append and replace is now GA. The feature safely replaces the 
existing data in an interval of a datasource while new data is being appended 
to that interval. One of the most common applications of this feature is 
appending new data (such as with streaming ingestion) to an interval while 
compaction of that interval is already in progress. 
+
+### Delta Lake improvements
+
+The community extension for Delta Lake has been improved to support [complex 
types](#delta-lake-complex-types) and [snapshot 
versions](#delta-lake-snapshot-versions).
+
+### Iceberg improvements
+
+The community extension for Iceberg has been improved. For more information, 
see [Iceberg improvements](#iceberg-improvements)
+
+### Projections (experimental)
+
+Druid 31.0.0 includes experimental support for projections in segments. Like 
materialized views, projections can improve the performance of queries by 
optimizing the route the query takes when it executes.

Review Comment:
   ok, i gave this a shot, also included some instruction on how to use the 
feature since it isn't documented yet
   
   
   >Druid 31.0.0 includes experimental support for new feature called 
projections. Projections are grouped pre-aggregates of a segment that are 
automatically used at query time to optimize execution for any queries which 
'fit' the shape of the projection by reducing both computation and i/o cost by 
reducing the number of rows which need to be processed. Projections are 
contained within segments of a datasource, and do increase the segment size, 
but are also able to share data such as value dictionaries of dictionary 
encoded columns with the columns of the base segment.
   
   >As an experimental feature, projections are not well documented yet, but 
can be defined for streaming ingestion and 'classic' batch ingestion as part of 
the `dataSchema`. For example, using the standard wikipedia example:
   
   ```   
       "dataSchema": {
         "granularitySpec": {
           ...
         },
         "dataSource": ...,
         "timestampSpec": {
           ...
         },
         "dimensionsSpec": {
           ...
         },
         "projections": [
           {
             "type": "aggregate",
             "name": "channel_page_hourly_distinct_user_added_deleted",
             "groupingColumns": [
               {
                 "type": "long",
                 "name": "__gran"
               },
               {
                 "type": "string",
                 "name": "channel"
               },
               {
                 "type": "string",
                 "name": "page"
               }
             ],
             "virtualColumns": [
               {
                 "type": "expression",
                 "expression": "timestamp_floor(__time, 'PT1H')",
                 "name": "__gran",
                 "outputType": "LONG"
               }
             ],
             "aggregators": [
               {
                 "type": "HLLSketchBuild",
                 "name": "distinct_users",
                 "fieldName": "user",
                 "round": true
               },
               {
                 "type": "longSum",
                 "name": "sum_added",
                 "fieldName": "added"
               },
               {
                 "type": "longSum",
                 "name": "sum_deleted",
                 "fieldName": "deleted"
               }
             ]
           },
           ...
         ]
       },
       ...
   ```
   
   >The `groupingColumns` define the order which data is sorted in the 
projection. Instead of explicitly defining granularity like for the base table, 
it is defined by defining a virtual column; during ingestion the processing 
logic finds the ‘finest’ granularity virtual column that is a `timestamp_floor` 
expression and uses it as the `__time` column for the projection. Projections 
do not need to have a time column defined, in which case they can still match 
queries that are not grouping on time.
   
   >Projections only can currently be defined by classic ingestion, but they 
can still be used by queries using MSQ or the new Dart engine. Future 
development will allow projections to be created as part of MSQ based ingestion 
as well.
   
   >There are a few new query context flags which have been added to aid in 
experimentation with projections. 
   * `useProjection` accepts a specific projection name and instructs the query 
engine that it must use that projection, and will fail the query if the 
projection does not match the query
   * `forceProjections` accepts `true` or `false` and instructs the query 
engine that it must use a projection, and will fail the query if it cannot find 
a matching projection
   * `noProjections` accpets `true` or `false` and instructs the query engines 
to not use any projections
   
   >We have a lot of plans to continue to improve this feature in the coming 
releases, but are excited to get it out there so users can begin 
experimentation since projections can dramatically improve query performance.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Druid 31.0.0 release notes (druid)

Reply via email to