Re: [PR] docs: 34.0.0 release notes (druid)

via GitHub Fri, 08 Aug 2025 16:04:29 -0700


capistrant commented on code in PR #18231:
URL: https://github.com/apache/druid/pull/18231#discussion_r2264107578



##########
docs/release-info/release-notes.md:
##########
@@ -57,63 +57,394 @@ For tips about how to write a good release note, see 
[Release notes](https://git
 
 This section contains important information about new and existing features.
 
+### Hadoop-based ingestion
+
+Hadoop-based ingestion has been deprecated since Druid 32.0 and will be 
removed as early as Druid 35.0.0. 
+We recommend one of Druid's other supported ingestion methods, such as 
[SQL-based ingestion](../multi-stage-query/index.md) or [MiddleManager-less 
ingestion using Kubernetes](../development/extensions-core/k8s-jobs.md).
+
+As part of this change, you must now opt-in to using the deprecated 
`index_hadoop` task type. If you don't do this, your Hadoop-based ingestion 
tasks will fail.
+
+To opt-in, set `druid.indexer.task.allowHadoopTaskExecution` to `true` in your 
`common.runtime.properties` file.
+
+[#18239](https://github.com/apache/druid/pull/18239)
+
+### Use SET statements for query context parameters
+
+You can now use SET statements to define query context parameters for a query 
through the [Druid console](#set-statements-in-the-druid-console) or the 
[API](#set-statements-with-the-api).
+
+[#17894](https://github.com/apache/druid/pull/17894) 
[#17974](https://github.com/apache/druid/pull/17974)
+
+#### SET statements in the Druid console
+
+The web console now supports using SET statements to specify query context 
parameters. For example, if you include `SET timeout = 20000;` in your query, 
the timeout query context parameter is set:
+
+```sql
+SET timeout = 20000;
+SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2
+```
+
+[#17966](https://github.com/apache/druid/pull/17966)
+
+#### SET statements with the API
+
+SQL queries issued to `/druid/v2/sql` can now include multiple SET statements 
to build up context for the final statement. For example, the following SQL 
query results includes the `timeout`, `useCache`, `populateCache`, `vectorize`, 
and `engine` query context parameters: 
+
+```sql
+SET timeout = 20000;
+SET useCache = false;
+SET populateCache = false;
+SET vectorize = 'force';
+SET engine = 'msq-dart'
+SELECT "channel", "page", sum("added") from "wikipedia" GROUP BY 1, 2
+```
+
+The API call for this query looks like the following: 
+
+```curl
+curl --location 'http://HOST:PORT/druid/v2/sql' \
+--header 'Content-Type: application/json' \
+--data '{
+  "query": "SET timeout=20000; SET useCache=false; SET populateCache=false; 
SET engine='\''msq-dart'\'';SELECT  user,  commentLength,COUNT(*) AS \"COUNT\" 
FROM wikipedia GROUP BY 1, 2 ORDER BY 2 DESC",
+  "resultFormat": "array",
+  "header": true,
+  "typesHeader": true,
+  "sqlTypesHeader": true
+}'
+```
+
+This improvement also works for INSERT and REPLACE queries using the MSQ task 
engine. Note that JDBC isn't supported.
+
+#### Improved HTTP endpoints
+
+You can now use raw SQL in the HTTP body for `/druid/v2/sql` endpoints. You 
can set `Content-Type` to `text/plain` instead of `application/json`, so you 
can provide raw text that isn't escaped. 
+
+ [#17937](https://github.com/apache/druid/pull/17937)
+
+### Cloning Historicals (experimental)
+
+You can now configure clones for Historicals using the dynamic Coordinator 
configuration `cloneServers`. Cloned Historicals are useful for situations such 
as rolling updates where you want to launch a new Historical as a replacement 
for an existing one.
+
+Set the config to a map from the target Historical server to the source 
Historical:
+
+```
+  "cloneServers": {"historicalClone":"historicalOriginal"}
+```
+
+The clone doesn't participate in regular segment assignment or balancing. 
Instead, the Coordinator mirrors any segment assignment made to the original 
Historical onto the clone, so that the clone becomes an exact copy of the 
source. Segments on the clone Historical do not count towards replica counts 
either. If the original Historical disappears, the clone remains in the last 
known state of the source server until removed from the `cloneServers` config.
+
+When you query your data using the native query engine, you can prefer 
(`preferClones`), exclude (`excludeClones`), or include (`includeClones`) 
clones by setting the query context parameter `cloneQueryMode`. By default, 
clones are excluded.
+
+As part of this change, new Coordinator APIs are available. For more 
information, see [Coordinator APIs for clones](#coordinator-apis-for-clones).
+
+[#17863](https://github.com/apache/druid/pull/17863) 
[#17899](https://github.com/apache/druid/pull/17899) 
[#17956](https://github.com/apache/druid/pull/17956) 
+
+### Embedded kill tasks on the Overlord (Experimental)
+
+You can now run kill tasks directly on the Overlord itself. Embedded kill 
tasks provide several benefits; they:
+
+- Kill segments as soon as they're eligible 
+- Don't take up tasks slot
+- finish faster since they use optimized metadata queries and don't launch a 
new JVM
+- Kill a small number of segments per task, ensuring locks on an interval 
aren't held for too long
+- Skip locked intervals to avoid head-of-line blocking
+- Require minimal configuration
+- Can keep up with a large number of unused segments in the cluster
+
+This feature is controlled by the following configs:
+
+- `druid.manager.segments.killUnused.enabled` - Whether the feature is enabled 
or not (Defaults to `false`)
+- `druid.manager.segments.killUnused.bufferPeriod` - The amount of time that a 
segment must be unused before it is able to be permanently removed from 
metadata and deep storage. This can serve as a buffer period to prevent data 
loss if data ends up being needed after being marked unused (Defaults to `P30D`)
+
+To use embedded kill tasks, you need to have segment metadata cache enabled.
+
+As part of this feature, [new metrics](#overlord-kill-task-metrics) have been 
added.
+
+[#18028](https://github.com/apache/druid/pull/18028) 
[#18124](https://github.com/apache/druid/pull/18124)
+
+### Preferred tier selection 
+You can now configure the Broker service to prefer Historicals on a specific 
tier. This is useful for across availability zone deployment. Brokers in one AZ 
select historicals in the same AZ by default but still keeps the ability to 
select historical nodes in another AZ if historicals in the same AZ are not 
available.
+
+To enable, set property `druid.broker.select.tier` to `perferred` in Broker 
runtime properties. You can then configure 
`druid.broker.select.tier.preferred.tier` to the tier you want each broker to 
prefer (i.e. for brokers in AZ1, you could set this to the tier name of your 
AZ1 historical servers).

Review Comment:
   ```suggestion
   To enable, set property `druid.broker.select.tier` to `perferred` in Broker 
runtime properties. You can then configure 
`druid.broker.select.tier.preferred.tier` to the tier you want each broker to 
prefer (i.e. for brokers in `AZ1`, you could set this to the tier name of your 
`AZ1` historical servers).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: 34.0.0 release notes (druid)

Reply via email to