vtlim commented on code in PR #18027:
URL: https://github.com/apache/druid/pull/18027#discussion_r2125151199
##########
docs/api-reference/sql-ingestion-api.md:
##########
@@ -100,23 +100,22 @@ The `/druid/v2/sql/task` endpoint accepts the following:
### Sample request
-The following example shows a query that fetches data from an external JSON
source and inserts it into a table named `wikipedia`.
+The following example shows a query that fetches data from an external JSON
source and inserts it into a table named `wikipedia`. It specifies two query
context parameters:
+- `maxNumTasks=3`: This limits the max number of parallel tasks for this data
loading job to 3.
Review Comment:
```suggestion
- `maxNumTasks=3`: This limits the maximum number of parallel tasks to 3.
```
##########
docs/api-reference/sql-ingestion-api.md:
##########
@@ -100,23 +100,22 @@ The `/druid/v2/sql/task` endpoint accepts the following:
### Sample request
-The following example shows a query that fetches data from an external JSON
source and inserts it into a table named `wikipedia`.
+The following example shows a query that fetches data from an external JSON
source and inserts it into a table named `wikipedia`. It specifies two query
context parameters:
+- `maxNumTasks=3`: This limits the max number of parallel tasks for this data
loading job to 3.
+- `finalizeAggregations=false`: This prevents Druid from performing final data
aggregation during loading. It's useful if you want to work with intermediate
data states or control aggregation later. For more information about Rollup,
see [Rollup](../multi-stage-query/concepts/#rollup).
Review Comment:
```suggestion
- `finalizeAggregations=false`: This prevents Druid from performing final
data aggregation during loading. It's useful if you want to work with
intermediate data states or control aggregation later. For more information,
see [Rollup](../multi-stage-query/concepts/#rollup).
```
##########
docs/querying/using-caching.md:
##########
@@ -83,8 +83,20 @@ As long as the service is set to populate the cache, you can
set cache options f
}
}
```
+
In this example the user has set `populateCache` to `false` to avoid filling
the result cache with results for segments that are over a year old. For more
information, see [Druid SQL client APIs](../api-reference/sql-api.md).
+You can also use the SET command to specify cache options directly within your
SQL query string. For example:
+
+```
+{
+ "query" : "SET useCache=true; SET populateCache=false; SELECT COUNT(*) FROM
data_source WHERE foo = 'bar' AND __time > TIMESTAMP '2020-01-01 00:00:00'"
+}
+```
+
+For details about SET, see [SET
statements](../querying/sql.md#set-statements).
+
+
Review Comment:
Let's actually not add a duplicate example in this doc. The other docs were
more introductory, so it made sense to show multiple ways. Instead, let's just
do a short one-liner letting the user know that there's another route to set
the parameter. "You can also set the context parameter directly in `query`
using the SET command. For details, see <link to the sql-api example>."
##########
docs/tutorials/tutorial-query-deep-storage.md:
##########
@@ -191,6 +191,25 @@ curl --location 'http://localhost:8888/druid/v2/sql/' \
The response you get back is an empty response cause there are no records on
the Historicals that match the query.
+You can also use Set command to enable `executionMode` of the given query.
However, the results depends on where you set `executionMode`.
Review Comment:
We'll likely roll this back as well (see comment on query-from-deep-storage)
##########
docs/multi-stage-query/reference.md:
##########
@@ -111,14 +111,15 @@
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-par
Keep the following in mind when using EXTERN to export rows:
- Only INSERT statements are supported.
- Only `CSV` format is supported as an export format.
-- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't
supported with EXTERN statements.
+- Partitioning (PARTITIONED BY) and clustering (CLUSTERED BY) aren't supported
with EXTERN statements.
- You can export to Amazon S3, Google GCS, or local storage.
- The destination provided should contain no other files or directories.
-When you export data, use the `rowsPerPage` context parameter to restrict the
size of exported files.
-When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
+When you export data, use SET to restrict `rowsPerPage` to control the size of
exported files. For example:
Review Comment:
```suggestion
The following statement shows the format of a SQL query using EXTERN to
export rows:
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -501,10 +506,11 @@ When using the sort-merge algorithm, keep the following
in mind:
- All join types are supported with `sortMerge`: LEFT, RIGHT, INNER, FULL, and
CROSS.
-The following example runs using a single sort-merge join stage that receives
`eventstream`
-(partitioned on `user_id`) and `users` (partitioned on `id`) as inputs. There
is no limit on the size of either input.
+The following example shows a single sort-merge join stage where it explicitly
set `sqlJoinAlgorithm` to `sortMerge` using the SET command. This query also
takes `eventstream` (partitioned on `user_id`) and `users` (partitioned on id)
as `inputs`, with no limit on the size of either input.
```sql
+SET sqlJoinAlgorithm='sortMerge';
+
Review Comment:
```suggestion
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -111,14 +111,15 @@
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-par
Keep the following in mind when using EXTERN to export rows:
- Only INSERT statements are supported.
- Only `CSV` format is supported as an export format.
-- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't
supported with EXTERN statements.
+- Partitioning (PARTITIONED BY) and clustering (CLUSTERED BY) aren't supported
with EXTERN statements.
- You can export to Amazon S3, Google GCS, or local storage.
- The destination provided should contain no other files or directories.
-When you export data, use the `rowsPerPage` context parameter to restrict the
size of exported files.
-When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
+When you export data, use SET to restrict `rowsPerPage` to control the size of
exported files. For example:
```sql
+SET rowsPerPage=<number_of_rows>;
+
Review Comment:
```suggestion
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -111,14 +111,15 @@
s3://export-bucket/export/query-6564a32f-2194-423a-912e-eead470a37c4-worker0-par
Keep the following in mind when using EXTERN to export rows:
- Only INSERT statements are supported.
- Only `CSV` format is supported as an export format.
-- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't
supported with EXTERN statements.
+- Partitioning (PARTITIONED BY) and clustering (CLUSTERED BY) aren't supported
with EXTERN statements.
- You can export to Amazon S3, Google GCS, or local storage.
- The destination provided should contain no other files or directories.
-When you export data, use the `rowsPerPage` context parameter to restrict the
size of exported files.
-When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
+When you export data, use SET to restrict `rowsPerPage` to control the size of
exported files. For example:
Review Comment:
I think the original text is better suited to generally introduce the
parameter and why you want to use it (not jumping into how you want to set it)
##########
docs/multi-stage-query/reference.md:
##########
@@ -127,6 +128,10 @@ SELECT
FROM <table>
```
+When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
Review Comment:
Move this line back above (see previous comment)
##########
docs/multi-stage-query/reference.md:
##########
@@ -127,6 +128,10 @@ SELECT
FROM <table>
```
+When the number of rows in the result set exceeds the value of the parameter,
Druid splits the output into multiple files.
+For details about SET, see [SET statements](../querying/sql.md#set-statements).
Review Comment:
```suggestion
For details about applying context parameters using SET, see [SET
statements](../querying/sql.md#set-statements).
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -501,10 +506,11 @@ When using the sort-merge algorithm, keep the following
in mind:
- All join types are supported with `sortMerge`: LEFT, RIGHT, INNER, FULL, and
CROSS.
-The following example runs using a single sort-merge join stage that receives
`eventstream`
-(partitioned on `user_id`) and `users` (partitioned on `id`) as inputs. There
is no limit on the size of either input.
+The following example shows a single sort-merge join stage where it explicitly
set `sqlJoinAlgorithm` to `sortMerge` using the SET command. This query also
takes `eventstream` (partitioned on `user_id`) and `users` (partitioned on id)
as `inputs`, with no limit on the size of either input.
Review Comment:
```suggestion
The following query runs a single sort-merge join stage that takes the
following inputs:
* `eventstream` partitioned on `user_id`
* `users` partitioned on `id`
There is no limit on the size of either input.
The SET clause sets the `sqlJoinAlgorithm` context parameter so that Druid
applies the sort-merge join algorithm for the query.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]