This is an automated email from the ASF dual-hosted git repository.
victoria pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new d4f116c95d1 [Docs] Update and add SQL examples to use SET (#18027)
d4f116c95d1 is described below
commit d4f116c95d1efbdd46a838033690d7aa313478b2
Author: Wanru Skuld Shao <[email protected]>
AuthorDate: Thu Jun 26 16:14:47 2025 -0700
[Docs] Update and add SQL examples to use SET (#18027)
Co-authored-by: Victoria Lim <[email protected]>
---
docs/api-reference/sql-api.md | 71 +++++++++++++++++----------
docs/api-reference/sql-ingestion-api.md | 29 +++++------
docs/multi-stage-query/examples.md | 2 +-
docs/multi-stage-query/reference.md | 18 +++++--
docs/querying/query-from-deep-storage.md | 7 ++-
docs/querying/using-caching.md | 3 +-
docs/tutorials/tutorial-query-deep-storage.md | 19 ++-----
7 files changed, 84 insertions(+), 65 deletions(-)
diff --git a/docs/api-reference/sql-api.md b/docs/api-reference/sql-api.md
index ef5ff6285d2..4176bcc117e 100644
--- a/docs/api-reference/sql-api.md
+++ b/docs/api-reference/sql-api.md
@@ -188,7 +188,12 @@ If you detect a truncated response, treat it as an error.
#### Sample request
-The following example retrieves all rows in the `wikipedia` datasource where
the `user` is `BlueMoon2662`. The query is assigned the ID `request01` using
the `sqlQueryId` context parameter. The optional properties `header`,
`typesHeader`, and `sqlTypesHeader` are set to `true` to include type
information to the response.
+In the following example, this query demonstrates the following actions:
+- Retrieves all rows from the `wikipedia` datasource.
+- Filters the results where the `user` value is `BlueMoon2662`.
+- Applies the `sqlTimeZone` context parameter to set the time zone of results
to `America/Los_Angeles`.
+- Returns descriptors for `header`, `typesHeader`, and `sqlTypesHeader`.
+
<Tabs>
@@ -200,7 +205,7 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql" \
--header 'Content-Type: application/json' \
--data '{
"query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''",
- "context" : {"sqlQueryId" : "request01"},
+ "context" : {"sqlTimeZone" : "America/Los_Angeles"},
"header" : true,
"typesHeader" : true,
"sqlTypesHeader" : true
@@ -215,11 +220,11 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql" \
POST /druid/v2/sql HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
-Content-Length: 192
+Content-Length: 201
{
"query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'",
- "context" : {"sqlQueryId" : "request01"},
+ "context" : {"sqlTimeZone" : "America/Los_Angeles"},
"header" : true,
"typesHeader" : true,
"sqlTypesHeader" : true
@@ -229,6 +234,20 @@ Content-Length: 192
</TabItem>
</Tabs>
+You can also specify query-level context parameters directly within the SQL
query string using the `SET` command. For more details, see [SET
statements](../querying/sql.md#set-statements).
+
+The following request body is functionally equivalent to the previous example
and uses SET instead of the `context` parameter:
+
+```JSON
+{
+ "query": "SET sqlTimeZone='America/Los_Angeles'; SELECT * FROM wikipedia
WHERE user='BlueMoon2662'",
+ "header": true,
+ "typesHeader": true,
+ "sqlTypesHeader": true
+}
+```
+
+
#### Sample response
<details>
@@ -262,24 +281,24 @@ Content-Length: 192
"sqlType": "VARCHAR"
},
"isAnonymous": {
- "type": "LONG",
- "sqlType": "BIGINT"
+ "type": "STRING",
+ "sqlType": "VARCHAR"
},
"isMinor": {
- "type": "LONG",
- "sqlType": "BIGINT"
+ "type": "STRING",
+ "sqlType": "VARCHAR"
},
"isNew": {
- "type": "LONG",
- "sqlType": "BIGINT"
+ "type": "STRING",
+ "sqlType": "VARCHAR"
},
"isRobot": {
- "type": "LONG",
- "sqlType": "BIGINT"
+ "type": "STRING",
+ "sqlType": "VARCHAR"
},
"isUnpatrolled": {
- "type": "LONG",
- "sqlType": "BIGINT"
+ "type": "STRING",
+ "sqlType": "VARCHAR"
},
"metroCode": {
"type": "LONG",
@@ -319,22 +338,22 @@ Content-Length: 192
}
},
{
- "__time": "2015-09-12T00:47:53.259Z",
+ "__time": "2015-09-11T17:47:53.259-07:00",
"channel": "#ja.wikipedia",
- "cityName": "",
+ "cityName": null,
"comment": "/* 対戦通算成績と得失点 */",
- "countryIsoCode": "",
- "countryName": "",
- "isAnonymous": 0,
- "isMinor": 1,
- "isNew": 0,
- "isRobot": 0,
- "isUnpatrolled": 0,
- "metroCode": 0,
+ "countryIsoCode": null,
+ "countryName": null,
+ "isAnonymous": "false",
+ "isMinor": "true",
+ "isNew": "false",
+ "isRobot": "false",
+ "isUnpatrolled": "false",
+ "metroCode": null,
"namespace": "Main",
"page": "アルビレックス新潟の年度別成績一覧",
- "regionIsoCode": "",
- "regionName": "",
+ "regionIsoCode": null,
+ "regionName": null,
"user": "BlueMoon2662",
"delta": 14,
"added": 14,
diff --git a/docs/api-reference/sql-ingestion-api.md
b/docs/api-reference/sql-ingestion-api.md
index 8fe4e01bab6..59942aff8e0 100644
--- a/docs/api-reference/sql-ingestion-api.md
+++ b/docs/api-reference/sql-ingestion-api.md
@@ -101,22 +101,23 @@ The `/druid/v2/sql/task` endpoint accepts the following:
### Sample request
The following example shows a query that fetches data from an external JSON
source and inserts it into a table named `wikipedia`.
+The example specifies two query context parameters:
+
+- `maxNumTasks=3`: Limits the maximum number of parallel tasks to 3.
+- `finalizeAggregations=false`: Ensures that Druid saves the aggregation's
intermediate type during ingestion. For more information, see
[Rollup](../multi-stage-query/concepts.md#rollup).
+
<Tabs>
<TabItem value="4" label="HTTP">
-
```HTTP
POST /druid/v2/sql/task HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
{
- "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"}, {\"name\": \"co [...]
- "context": {
- "maxNumTasks": 3
- }
+ "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO
wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n
EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \" [...]
}
```
@@ -128,12 +129,9 @@ Content-Type: application/json
```shell
curl --location --request POST
'http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task' \
--header 'Content-Type: application/json' \
- --data-raw '{
- "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '\''{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n '\''{\"type\":
\"json\"}'\'',\n '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"} [...]
- "context": {
- "maxNumTasks": 3
- }
- }'
+ --data '{
+ "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO
wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n
EXTERN(\n '\''{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\'',\n '\''{\"type\":
\"json\"}'\'',\n '\''[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"} [...]
+}'
```
</TabItem>
@@ -148,10 +146,7 @@ import requests
url = "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task"
payload = json.dumps({
- "query": "INSERT INTO wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS
__time,\n *\nFROM TABLE(\n EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\":
\"commentLength\", \"type\": \"long\"}, {\"name\": \"co [...]
- "context": {
- "maxNumTasks": 3
- }
+ "query": "SET maxNumTasks=3;\nSET finalizeAggregations=false;\nINSERT INTO
wikipedia\nSELECT\n TIME_PARSE(\"timestamp\") AS __time,\n *\nFROM TABLE(\n
EXTERN(\n '{\"type\": \"http\", \"uris\":
[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n '{\"type\":
\"json\"}',\n '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\":
\"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\":
\"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \" [...]
})
headers = {
'Content-Type': 'application/json'
@@ -174,8 +169,8 @@ print(response.text)
```json
{
- "taskId": "query-f795a235-4dc7-4fef-abac-3ae3f9686b79",
- "state": "RUNNING",
+ "taskId": "query-431c4a18-9dde-4ec8-ab82-ec7fd17d5a4e",
+ "state": "RUNNING"
}
```
</details>
diff --git a/docs/multi-stage-query/examples.md
b/docs/multi-stage-query/examples.md
index 013ff6580ce..d81e874d1f7 100644
--- a/docs/multi-stage-query/examples.md
+++ b/docs/multi-stage-query/examples.md
@@ -32,7 +32,7 @@ sidebar_label: Examples
These example queries show you some of the things you can do when modifying
queries for your use case. Copy the example queries into the **Query** view of
the web console and run them to see what they do.
:::tip
-When you insert or replace data with SQL-based ingestion, set the context
parameter `finalizeAggregations` to `false`. This context parameter is
automatically set for you if you use the Druid console. If you use the API, you
must explicitly set it. For more information, see
[Rollup](./concepts.md#rollup).
+When you insert or replace data with SQL-based ingestion, set the context
parameter `finalizeAggregations` to `false`. This context parameter is
automatically set for you if you use the Druid console. If you use the API, you
must explicitly set it. For an example, see [SQL-based ingestion
API](../api-reference/sql-ingestion-api#sample-request). For details on
aggregations, see [Rollup](./concepts.md#rollup).
:::
## INSERT with no rollup
diff --git a/docs/multi-stage-query/reference.md
b/docs/multi-stage-query/reference.md
index 8f4cfbd1b58..9cd612337c5 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -111,14 +111,16 @@
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-par
Keep the following in mind when using EXTERN to export rows:
- Only INSERT statements are supported.
- Only `CSV` format is supported as an export format.
-- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't
supported with EXTERN statements.
+- Partitioning (PARTITIONED BY) and clustering (CLUSTERED BY) aren't supported
with EXTERN statements.
- You can export to Amazon S3, Google GCS, or local storage.
- The destination provided should contain no other files or directories.
When you export data, use the `rowsPerPage` context parameter to restrict the
size of exported files.
When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
+The following statement shows the format of a SQL query using EXTERN to export
rows:
```sql
+SET rowsPerPage=<number_of_rows>;
INSERT INTO
EXTERN(<destination function>)
AS CSV
@@ -127,6 +129,9 @@ SELECT
FROM <table>
```
+For details on applying context parameters using SET, see [SET
statements](../querying/sql.md#set-statements).
+
+
##### S3 - Amazon S3
To export results to S3, pass the `s3()` function as an argument to the
`EXTERN` function.
@@ -501,10 +506,15 @@ When using the sort-merge algorithm, keep the following
in mind:
- All join types are supported with `sortMerge`: LEFT, RIGHT, INNER, FULL, and
CROSS.
-The following example runs using a single sort-merge join stage that receives
`eventstream`
-(partitioned on `user_id`) and `users` (partitioned on `id`) as inputs. There
is no limit on the size of either input.
+The following query runs a single sort-merge join stage that takes the
following inputs:
+* `eventstream` partitioned on `user_id`
+* `users` partitioned on `id`
+
+There is no limit on the size of either input.
+The SET command assigns the `sqlJoinAlgorithm` context parameter so that Druid
uses the sort-merge join algorithm for the query.
```sql
+SET sqlJoinAlgorithm='sortMerge';
REPLACE INTO eventstream_enriched
OVERWRITE ALL
SELECT
@@ -519,8 +529,6 @@ PARTITIONED BY HOUR
CLUSTERED BY user
```
-The context parameter that sets `sqlJoinAlgorithm` to `sortMerge` is not shown
in the above example.
-
## Durable storage
SQL-based ingestion supports using durable storage to store intermediate files
temporarily. Enabling it can improve reliability. For more information, see
[Durable storage](../operations/durable-storage.md).
diff --git a/docs/querying/query-from-deep-storage.md
b/docs/querying/query-from-deep-storage.md
index e9430db8b56..ef2b9becb61 100644
--- a/docs/querying/query-from-deep-storage.md
+++ b/docs/querying/query-from-deep-storage.md
@@ -113,6 +113,12 @@ curl --location
'http://localhost:8888/druid/v2/sql/statements' \
}'
```
+Note that you can also submit context parameters using
[SET](../querying/sql.md#set-statements). For example:
+
+```
+ "query": "SET executionMode = '\''ASYNC'\''; SET selectDestination =
'\''durableStorage'\''; SELECT * FROM \"YOUR_DATASOURCE\" WHERE \"__time\" >
TIMESTAMP '\''2017-09-01'\'' AND \"__time\" <= TIMESTAMP '\''2017-09-02'\''"
+```
+
The response for submitting a query includes the query ID along with basic
information, such as when you submitted the query and the schema of the results:
```json
@@ -132,7 +138,6 @@ The response for submitting a query includes the query ID
along with basic infor
}
```
-
### Get query status
You can check the status of a query with the following API call:
diff --git a/docs/querying/using-caching.md b/docs/querying/using-caching.md
index 12e8b5bbe2f..d0d2fced0a2 100644
--- a/docs/querying/using-caching.md
+++ b/docs/querying/using-caching.md
@@ -83,9 +83,10 @@ As long as the service is set to populate the cache, you can
set cache options f
}
}
```
-In this example the user has set `populateCache` to `false` to avoid filling
the result cache with results for segments that are over a year old. For more
information, see [Druid SQL client APIs](../api-reference/sql-api.md).
+In this example the user has set `populateCache` to `false` to avoid filling
the result cache with results for segments that are over a year old. For more
information, see [Druid SQL client APIs](../api-reference/sql-api.md).
+You can also use the SET command to specify cache options directly within your
SQL query string. For more information, see [SET
statements](../querying/sql.md#set-statements).
## Learn more
See the following topics for more information:
diff --git a/docs/tutorials/tutorial-query-deep-storage.md
b/docs/tutorials/tutorial-query-deep-storage.md
index 3a46f1ea492..61a25955bbb 100644
--- a/docs/tutorials/tutorial-query-deep-storage.md
+++ b/docs/tutorials/tutorial-query-deep-storage.md
@@ -130,11 +130,8 @@ Now that there are segments that are only available from
deep storage, run the f
SELECT page FROM wikipedia WHERE __time < TIMESTAMP'2016-06-27 00:10:00'
LIMIT 10
```
-With the context parameter:
-
-```json
-"executionMode": "ASYNC"
-```
+To run this query asynchronously, specify the `ASYNC` execution mode using the
query context.
+Apply the query context parameter before the query using a SET statement.
For example, run the following curl command:
@@ -142,10 +139,7 @@ For example, run the following curl command:
curl --location 'http://localhost:8888/druid/v2/sql/statements' \
--header 'Content-Type: application/json' \
--data '{
- "query":"SELECT page FROM wikipedia WHERE __time <
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
- "context":{
- "executionMode":"ASYNC"
- }
+ "query": "SET executionMode = '\''ASYNC'\''; SELECT page FROM wikipedia
WHERE __time < TIMESTAMP '\''2016-06-27 00:10:00'\'' LIMIT 10"
}'
```
@@ -182,10 +176,7 @@ Compare this to if you were to submit the query to Druid
SQL's regular endpoint,
curl --location 'http://localhost:8888/druid/v2/sql/' \
--header 'Content-Type: application/json' \
--data '{
- "query":"SELECT page FROM wikipedia WHERE __time <
TIMESTAMP'\''2016-06-27 00:10:00'\'' LIMIT 10",
- "context":{
- "executionMode":"ASYNC"
- }
+ "query": "SET executionMode = '\''ASYNC'\''; SELECT page FROM wikipedia
WHERE __time < TIMESTAMP '\''2016-06-27 00:10:00'\'' LIMIT 10"
}'
```
@@ -293,4 +284,4 @@ Note that the response has been truncated for brevity.
## Further reading
* [Query from deep storage](../querying/query-from-deep-storage.md)
-* [Query from deep storage API
reference](../api-reference/sql-api.md#query-from-deep-storage)
\ No newline at end of file
+* [Query from deep storage API
reference](../api-reference/sql-api.md#query-from-deep-storage)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]