(druid) branch master updated: [Docs] Update and add SQL examples to use SET (#18027)

victoria Thu, 26 Jun 2025 16:15:00 -0700

This is an automated email from the ASF dual-hosted git repository.

victoria pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new d4f116c95d1 [Docs] Update and add SQL examples to use SET (#18027)
d4f116c95d1 is described below

commit d4f116c95d1efbdd46a838033690d7aa313478b2
Author: Wanru Skuld Shao <[email protected]>
AuthorDate: Thu Jun 26 16:14:47 2025 -0700

    [Docs] Update and add SQL examples to use SET (#18027)
    
    Co-authored-by: Victoria Lim <[email protected]>
---
 docs/api-reference/sql-api.md                 | 71 +++++++++++++++++----------
 docs/api-reference/sql-ingestion-api.md       | 29 +++++------
 docs/multi-stage-query/examples.md            |  2 +-
 docs/multi-stage-query/reference.md           | 18 +++++--
 docs/querying/query-from-deep-storage.md      |  7 ++-
 docs/querying/using-caching.md                |  3 +-
 docs/tutorials/tutorial-query-deep-storage.md | 19 ++-----
 7 files changed, 84 insertions(+), 65 deletions(-)

diff --git a/docs/api-reference/sql-api.md b/docs/api-reference/sql-api.md
index ef5ff6285d2..4176bcc117e 100644
--- a/docs/api-reference/sql-api.md
+++ b/docs/api-reference/sql-api.md
@@ -188,7 +188,12 @@ If you detect a truncated response, treat it as an error.
 
 #### Sample request
 
-The following example retrieves all rows in the `wikipedia` datasource where 
the `user` is `BlueMoon2662`. The query is assigned the ID `request01` using 
the `sqlQueryId` context parameter. The optional properties `header`, 
`typesHeader`, and `sqlTypesHeader` are set to `true` to include type 
information to the response.
+In the following example, this query demonstrates the following actions:
+- Retrieves all rows from the `wikipedia` datasource.
+- Filters the results where the `user` value is `BlueMoon2662`.
+- Applies the `sqlTimeZone` context parameter to set the time zone of results 
to `America/Los_Angeles`.
+- Returns descriptors for `header`, `typesHeader`, and `sqlTypesHeader`.
+
 
 <Tabs>
 
@@ -200,7 +205,7 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql"; \
 --header 'Content-Type: application/json' \
 --data '{
     "query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''",
-    "context" : {"sqlQueryId" : "request01"},
+    "context" : {"sqlTimeZone" : "America/Los_Angeles"},
     "header" : true,
     "typesHeader" : true,
     "sqlTypesHeader" : true
@@ -215,11 +220,11 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql"; \
 POST /druid/v2/sql HTTP/1.1
 Host: http://ROUTER_IP:ROUTER_PORT
 Content-Type: application/json
-Content-Length: 192
+Content-Length: 201
 
 {
     "query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'",
-    "context" : {"sqlQueryId" : "request01"},
+    "context" : {"sqlTimeZone" : "America/Los_Angeles"},
     "header" : true,
     "typesHeader" : true,
     "sqlTypesHeader" : true
@@ -229,6 +234,20 @@ Content-Length: 192
 </TabItem>
 </Tabs>
 
+You can also specify query-level context parameters directly within the SQL 
query string using the `SET` command. For more details, see [SET 
statements](../querying/sql.md#set-statements).
+
+The following request body is functionally equivalent to the previous example 
and uses SET instead of the `context` parameter:
+
+```JSON
+{
+  "query": "SET sqlTimeZone='America/Los_Angeles'; SELECT * FROM wikipedia 
WHERE user='BlueMoon2662'",
+  "header": true,
+  "typesHeader": true,
+  "sqlTypesHeader": true
+}
+```
+
+
 #### Sample response
 
 <details>
@@ -262,24 +281,24 @@ Content-Length: 192
             "sqlType": "VARCHAR"
         },
         "isAnonymous": {
-            "type": "LONG",
-            "sqlType": "BIGINT"
+            "type": "STRING",
+            "sqlType": "VARCHAR"
         },
         "isMinor": {
-            "type": "LONG",
-            "sqlType": "BIGINT"
+            "type": "STRING",
+            "sqlType": "VARCHAR"
         },
         "isNew": {
-            "type": "LONG",
-            "sqlType": "BIGINT"
+            "type": "STRING",
+            "sqlType": "VARCHAR"
         },
         "isRobot": {
-            "type": "LONG",
-            "sqlType": "BIGINT"
+            "type": "STRING",
+            "sqlType": "VARCHAR"
         },
         "isUnpatrolled": {
-            "type": "LONG",
-            "sqlType": "BIGINT"
+            "type": "STRING",
+            "sqlType": "VARCHAR"
         },
         "metroCode": {
             "type": "LONG",
@@ -319,22 +338,22 @@ Content-Length: 192
         }
     },
     {
-        "__time": "2015-09-12T00:47:53.259Z",
+        "__time": "2015-09-11T17:47:53.259-07:00",
         "channel": "#ja.wikipedia",
-        "cityName": "",
+        "cityName": null,
         "comment": "/* 対戦通算成績と得失点 */",
-        "countryIsoCode": "",
-        "countryName": "",
-        "isAnonymous": 0,
-        "isMinor": 1,
-        "isNew": 0,
-        "isRobot": 0,
-        "isUnpatrolled": 0,
-        "metroCode": 0,
+        "countryIsoCode": null,
+        "countryName": null,
+        "isAnonymous": "false",
+        "isMinor": "true",
+        "isNew": "false",
+        "isRobot": "false",
+        "isUnpatrolled": "false",
+        "metroCode": null,
         "namespace": "Main",
         "page": "アルビレックス新潟の年度別成績一覧",
-        "regionIsoCode": "",
-        "regionName": "",
+        "regionIsoCode": null,
+        "regionName": null,
         "user": "BlueMoon2662",
         "delta": 14,
         "added": 14,
diff --git a/docs/api-reference/sql-ingestion-api.md 
b/docs/api-reference/sql-ingestion-api.md
index 8fe4e01bab6..59942aff8e0 100644
--- a/docs/api-reference/sql-ingestion-api.md
+++ b/docs/api-reference/sql-ingestion-api.md
@@ -101,22 +101,23 @@ The `/druid/v2/sql/task` endpoint accepts the following:
 ### Sample request
 
 The following example shows a query that fetches data from an external JSON 
source and inserts it into a table named `wikipedia`.
+The example specifies two query context parameters:
+
+- `maxNumTasks=3`: Limits the maximum number of parallel tasks to 3.
+- `finalizeAggregations=false`: Ensures that Druid saves the aggregation's 
intermediate type during ingestion. For more information, see 
[Rollup](../multi-stage-query/concepts.md#rollup).
+
 
 <Tabs>
 
 <TabItem value="4" label="HTTP">
 
-
 ```HTTP
 POST /druid/v2/sql/task HTTP/1.1
 Host: http://ROUTER_IP:ROUTER_PORT
 Content-Type: application/json
 
 {
-  "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS 
__time,\n  *\nFROM TABLE(\n  EXTERN(\n    '{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n    '{\"type\": 
\"json\"}',\n    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": 
\"commentLength\", \"type\": \"long\"}, {\"name\": \"co [...]
-  "context": {
-    "maxNumTasks": 3
-  }
+  "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO 
wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  
EXTERN(\n    '{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n    '{\"type\": 
\"json\"}',\n    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \" [...]
 }
 ```
 
@@ -128,12 +129,9 @@ Content-Type: application/json
 ```shell
 curl --location --request POST 
'http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task' \
   --header 'Content-Type: application/json' \
-  --data-raw '{
-    "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS 
__time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\''{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": 
\"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": 
\"commentLength\", \"type\": \"long\"} [...]
-    "context": {
-        "maxNumTasks": 3
-    }
-  }'
+  --data '{
+  "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO 
wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  
EXTERN(\n    '\''{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n    '\''{\"type\": 
\"json\"}'\'',\n    '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"} [...]
+}'
 ```
 
 </TabItem>
@@ -148,10 +146,7 @@ import requests
 url = "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task";
 
 payload = json.dumps({
-  "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS 
__time,\n  *\nFROM TABLE(\n  EXTERN(\n    '{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n    '{\"type\": 
\"json\"}',\n    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": 
\"commentLength\", \"type\": \"long\"}, {\"name\": \"co [...]
-  "context": {
-    "maxNumTasks": 3
-  }
+  "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO 
wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  
EXTERN(\n    '{\"type\": \"http\", \"uris\": 
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n    '{\"type\": 
\"json\"}',\n    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": 
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": 
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \" [...]
 })
 headers = {
   'Content-Type': 'application/json'
@@ -174,8 +169,8 @@ print(response.text)
 
 ```json
 {
-  "taskId": "query-f795a235-4dc7-4fef-abac-3ae3f9686b79",
-  "state": "RUNNING",
+    "taskId": "query-431c4a18-9dde-4ec8-ab82-ec7fd17d5a4e",
+    "state": "RUNNING"
 }
 ```
 </details>
diff --git a/docs/multi-stage-query/examples.md 
b/docs/multi-stage-query/examples.md
index 013ff6580ce..d81e874d1f7 100644
--- a/docs/multi-stage-query/examples.md
+++ b/docs/multi-stage-query/examples.md
@@ -32,7 +32,7 @@ sidebar_label: Examples
 These example queries show you some of the things you can do when modifying 
queries for your use case. Copy the example queries into the **Query** view of 
the web console and run them to see what they do.
 
 :::tip
-When you insert or replace data with SQL-based ingestion, set the context 
parameter `finalizeAggregations` to `false`. This context parameter is 
automatically set for you if you use the Druid console. If you use the API, you 
must explicitly set it. For more information, see 
[Rollup](./concepts.md#rollup).
+When you insert or replace data with SQL-based ingestion, set the context 
parameter `finalizeAggregations` to `false`. This context parameter is 
automatically set for you if you use the Druid console. If you use the API, you 
must explicitly set it. For an example, see [SQL-based ingestion 
API](../api-reference/sql-ingestion-api#sample-request). For details on 
aggregations, see [Rollup](./concepts.md#rollup).
 :::
 
 ## INSERT with no rollup
diff --git a/docs/multi-stage-query/reference.md 
b/docs/multi-stage-query/reference.md
index 8f4cfbd1b58..9cd612337c5 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -111,14 +111,16 @@ 
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-par
 Keep the following in mind when using EXTERN to export rows:
 - Only INSERT statements are supported.
 - Only `CSV` format is supported as an export format.
-- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't 
supported with EXTERN statements.
+- Partitioning (PARTITIONED BY) and clustering (CLUSTERED BY) aren't supported 
with EXTERN statements.
 - You can export to Amazon S3, Google GCS, or local storage.
 - The destination provided should contain no other files or directories.
 
 When you export data, use the `rowsPerPage` context parameter to restrict the 
size of exported files.
 When the number of rows in the result set exceeds the value of the parameter, 
Druid splits the output into multiple files.
+The following statement shows the format of a SQL query using EXTERN to export 
rows:
 
 ```sql
+SET rowsPerPage=<number_of_rows>;
 INSERT INTO
   EXTERN(<destination function>)
 AS CSV
@@ -127,6 +129,9 @@ SELECT
 FROM <table>
 ```
 
+For details on applying context parameters using SET, see [SET 
statements](../querying/sql.md#set-statements).
+
+
 ##### S3 - Amazon S3
 
 To export results to S3, pass the `s3()` function as an argument to the 
`EXTERN` function.
@@ -501,10 +506,15 @@ When using the sort-merge algorithm, keep the following 
in mind:
 
 - All join types are supported with `sortMerge`: LEFT, RIGHT, INNER, FULL, and 
CROSS.
 
-The following example  runs using a single sort-merge join stage that receives 
`eventstream`
-(partitioned on `user_id`) and `users` (partitioned on `id`) as inputs. There 
is no limit on the size of either input.
+The following query runs a single sort-merge join stage that takes the 
following inputs:
+* `eventstream` partitioned on `user_id`
+* `users` partitioned on `id`
+
+There is no limit on the size of either input.
+The SET command assigns the `sqlJoinAlgorithm` context parameter so that Druid 
uses the sort-merge join algorithm for the query.
 
 ```sql
+SET sqlJoinAlgorithm='sortMerge';
 REPLACE INTO eventstream_enriched
 OVERWRITE ALL
 SELECT
@@ -519,8 +529,6 @@ PARTITIONED BY HOUR
 CLUSTERED BY user
 ```
 
-The context parameter that sets `sqlJoinAlgorithm` to `sortMerge` is not shown 
in the above example.
-
 ## Durable storage
 
 SQL-based ingestion supports using durable storage to store intermediate files 
temporarily. Enabling it can improve reliability. For more information, see 
[Durable storage](../operations/durable-storage.md).
diff --git a/docs/querying/query-from-deep-storage.md 
b/docs/querying/query-from-deep-storage.md
index e9430db8b56..ef2b9becb61 100644
--- a/docs/querying/query-from-deep-storage.md
+++ b/docs/querying/query-from-deep-storage.md
@@ -113,6 +113,12 @@ curl --location 
'http://localhost:8888/druid/v2/sql/statements' \
 }'
 ```
 
+Note that you can also submit context parameters using 
[SET](../querying/sql.md#set-statements). For example:
+
+```
+  "query": "SET executionMode = '\''ASYNC'\''; SET selectDestination = 
'\''durableStorage'\''; SELECT * FROM \"YOUR_DATASOURCE\" WHERE \"__time\" > 
TIMESTAMP '\''2017-09-01'\'' AND \"__time\" <= TIMESTAMP '\''2017-09-02'\''"
+```
+
 The response for submitting a query includes the query ID along with basic 
information, such as when you submitted the query and the schema of the results:
 
 ```json
@@ -132,7 +138,6 @@ The response for submitting a query includes the query ID 
along with basic infor
 }
 ```
 
-
 ### Get query status
 
 You can check the status of a query with the following API call:
diff --git a/docs/querying/using-caching.md b/docs/querying/using-caching.md
index 12e8b5bbe2f..d0d2fced0a2 100644
--- a/docs/querying/using-caching.md
+++ b/docs/querying/using-caching.md
@@ -83,9 +83,10 @@ As long as the service is set to populate the cache, you can 
set cache options f
   }
 }
 ```
-In this example the user has set `populateCache` to `false` to avoid filling 
the result cache with results for segments that are over a year old. For more 
information, see [Druid SQL client APIs](../api-reference/sql-api.md).
 
+In this example the user has set `populateCache` to `false` to avoid filling 
the result cache with results for segments that are over a year old. For more 
information, see [Druid SQL client APIs](../api-reference/sql-api.md).
 
+You can also use the SET command to specify cache options directly within your 
SQL query string. For more information, see [SET 
statements](../querying/sql.md#set-statements). 
 
 ## Learn more
 See the following topics for more information:
diff --git a/docs/tutorials/tutorial-query-deep-storage.md 
b/docs/tutorials/tutorial-query-deep-storage.md
index 3a46f1ea492..61a25955bbb 100644
--- a/docs/tutorials/tutorial-query-deep-storage.md
+++ b/docs/tutorials/tutorial-query-deep-storage.md
@@ -130,11 +130,8 @@ Now that there are segments that are only available from 
deep storage, run the f
 SELECT page FROM wikipedia WHERE __time <  TIMESTAMP'2016-06-27 00:10:00' 
LIMIT 10
 ```
 
-With the context parameter:
-
-```json
-"executionMode": "ASYNC"
-```
+To run this query asynchronously, specify the `ASYNC` execution mode using the 
query context.
+Apply the query context parameter before the query using a SET statement. 
 
 For example, run the following curl command:
 
@@ -142,10 +139,7 @@ For example, run the following curl command:
 curl --location 'http://localhost:8888/druid/v2/sql/statements' \
 --header 'Content-Type: application/json' \
 --data '{
-    "query":"SELECT page FROM wikipedia WHERE __time <  
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
-    "context":{
-        "executionMode":"ASYNC"
-    }  
+  "query": "SET executionMode = '\''ASYNC'\''; SELECT page FROM wikipedia 
WHERE __time < TIMESTAMP '\''2016-06-27 00:10:00'\'' LIMIT 10"
 }'
 ```
 
@@ -182,10 +176,7 @@ Compare this to if you were to submit the query to Druid 
SQL's regular endpoint,
 curl --location 'http://localhost:8888/druid/v2/sql/' \
 --header 'Content-Type: application/json' \
 --data '{
-    "query":"SELECT page FROM wikipedia WHERE __time <  
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
-    "context":{
-        "executionMode":"ASYNC"
-    }  
+  "query": "SET executionMode = '\''ASYNC'\''; SELECT page FROM wikipedia 
WHERE __time < TIMESTAMP '\''2016-06-27 00:10:00'\'' LIMIT 10"
 }'
 ```
 
@@ -293,4 +284,4 @@ Note that the response has been truncated for brevity.
 ## Further reading
 
 * [Query from deep storage](../querying/query-from-deep-storage.md)
-* [Query from deep storage API 
reference](../api-reference/sql-api.md#query-from-deep-storage)
\ No newline at end of file
+* [Query from deep storage API 
reference](../api-reference/sql-api.md#query-from-deep-storage)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: [Docs] Update and add SQL examples to use SET (#18027)

Reply via email to