317brian commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1480461042
##########
docs/multi-stage-query/concepts.md:
##########
@@ -115,6 +115,14 @@ When deciding whether to use `REPLACE` or `INSERT`, keep
in mind that segments g
with dimension-based pruning but those generated with `INSERT` cannot. For
more information about the requirements
for dimension-based pruning, see [Clustering](#clustering).
+### Write to an external destination with `EXTERN`
+
+Query tasks can write data to an external destination through the `EXTERN`
function, when it is used with the `INTO`
+clause, such as `REPLACE INTO EXTERN(...)` The EXTERN function takes arguments
which specifies where to the files should be created.
+The format can be specified using an `AS` clause.
Review Comment:
```suggestion
Query tasks can write data to an external destination through the `EXTERN`
function when it is used with the `INTO`
clause, such as `REPLACE INTO EXTERN(...)`. The EXTERN function takes
arguments that specify where to write the files.
The format can be specified using an `AS` clause.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
Review Comment:
```suggestion
This variation of EXTERN requires two arguments: the details of the
destination and an `AS` clause to specify the format of the exported rows.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
Review Comment:
```suggestion
INSERT statements append the results to the existing files at the
destination.
```sql
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
Review Comment:
```suggestion
| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes that are whitelisted as export destinations. Export queries fail if
the export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
+| `druid.export.storage.s3.maxRetry` | No | Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
| 10 |
+| `druid.export.storage.s3.chunkSize` | No | Defines the size
of each chunk to temporarily store in `tempDir`. The chunk size must be between
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it
requires more disk space to store the temporary chunks. | 100MiB |
+
+##### LOCAL
+
+Exporting is also supported to the local storage, which exports the results to
the filesystem of the MSQ worker.
+This is useful in a single node setup or for testing, and is not suitable for
production use cases.
+
+This can be done by passing the function `LOCAL()` as an argument to the
`EXTERN FUNCTION`.
+Arguments to `LOCAL()` should be passed as named parameters with the value in
single quotes like the example below.
+
+To use local as an export destination, the runtime property
`druid.export.storage.baseDir` must be configured on the indexer/middle manager.
+The parameter provided to the `LOCAL()` function will be prefixed with this
value when exporting to a local destination.
Review Comment:
```suggestion
Arguments to `LOCAL()` should be passed as named parameters with the value
in single quotes in the following example:
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
Review Comment:
```suggestion
Keep the following in mind when using EXTERN to export rows:
- Only INSERT statements are supported.
- Only `CSV` format is supported as an export format.
- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't
supported with export statements.
- You can export to Amazon S3 or local storage.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
Review Comment:
```suggestion
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
Review Comment:
When you export data, use the `rowsPerPage` context parameter to control how
many rows get exported. The default is 100,000.
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
+| `druid.export.storage.s3.maxRetry` | No | Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
| 10 |
+| `druid.export.storage.s3.chunkSize` | No | Defines the size
of each chunk to temporarily store in `tempDir`. The chunk size must be between
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it
requires more disk space to store the temporary chunks. | 100MiB |
+
+##### LOCAL
+
+Exporting is also supported to the local storage, which exports the results to
the filesystem of the MSQ worker.
Review Comment:
```suggestion
You can export to the local storage, which exports the results to the
filesystem of the MSQ worker.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
Review Comment:
```suggestion
`EXTERN` can be used to specify a destination where you want to export data
to.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
Review Comment:
```suggestion
Export results to S3 by passing the function `S3()` as an argument to the
`EXTERN` function. Note that this requires the `druid-s3-extensions`.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
Review Comment:
```suggestion
Supported arguments for the function:
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
Review Comment:
```suggestion
The `S3()` function is a Druid function that configures the connection.
Arguments for `S3()` should be passed as named parameters with the value in
single quotes like the following example:
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
+| `druid.export.storage.s3.maxRetry` | No | Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
| 10 |
+| `druid.export.storage.s3.chunkSize` | No | Defines the size
of each chunk to temporarily store in `tempDir`. The chunk size must be between
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it
requires more disk space to store the temporary chunks. | 100MiB |
+
+##### LOCAL
+
+Exporting is also supported to the local storage, which exports the results to
the filesystem of the MSQ worker.
+This is useful in a single node setup or for testing, and is not suitable for
production use cases.
+
+This can be done by passing the function `LOCAL()` as an argument to the
`EXTERN FUNCTION`.
+Arguments to `LOCAL()` should be passed as named parameters with the value in
single quotes like the example below.
+
+To use local as an export destination, the runtime property
`druid.export.storage.baseDir` must be configured on the indexer/middle manager.
+The parameter provided to the `LOCAL()` function will be prefixed with this
value when exporting to a local destination.
+
+```sql
+INSERT INTO
+ EXTERN(
+ local(exportPath => 'exportLocation/query1')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
--|
+| `exportPath` | Yes | Subdirectory of `druid.export.storage.baseDir` used to
as the destination to export the results to. The export query expects the
destination to be empty. If the location includes other files or directories,
then the query will fail. | n/a |
Review Comment:
```suggestion
| `exportPath` | Yes | Subdirectory of `druid.export.storage.baseDir` used
as the destination to export the results to. The export query expects the
destination to be empty. If the location includes other files or directories,
then the query will fail. | n/a |
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
Review Comment:
```suggestion
| `prefix` | Yes | Path where the exported files would be created.
The export query expects the destination to be empty. If the location includes
other files, the query will fail.
| n/a |
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
+| `druid.export.storage.s3.maxRetry` | No | Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
| 10 |
+| `druid.export.storage.s3.chunkSize` | No | Defines the size
of each chunk to temporarily store in `tempDir`. The chunk size must be between
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it
requires more disk space to store the temporary chunks. | 100MiB |
+
+##### LOCAL
+
+Exporting is also supported to the local storage, which exports the results to
the filesystem of the MSQ worker.
+This is useful in a single node setup or for testing, and is not suitable for
production use cases.
Review Comment:
```suggestion
This is useful in a single node setup or for testing but is not suitable for
production use cases.
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
For more information, see [Read external data with
EXTERN](concepts.md#read-external-data-with-extern).
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be
exported.
+This variation of EXTERN requires one argument, the details of the destination
as specified below.
+This variation additionally requires an `AS` clause to specify the format of
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+ EXTERN(<destination function>)
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection.
Arguments to `S3()` should be passed as named parameters with the value in
single quotes like the example below.
+
+```sql
+INSERT INTO
+ EXTERN(
+ S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+ )
+AS CSV
+SELECT
+ <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter | Required | Description
| Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket` | Yes | The S3 bucket to which the files are exported to.
| n/a |
+| `prefix` | Yes | Path where the exported files would be created. The
export query would expect the destination to be empty. If the location includes
other files, then the query will fail.
| n/a |
+
+The following runtime parameters must be configured to export into an S3
destination:
+
+| Runtime Parameter | Required | Description
| Default |
+|----------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir` | Yes | Directory used to
store temporary files required while uploading the data.
| n/a |
+| `druid.export.storage.s3.allowedExportPaths` | Yes | An array of S3
prefixes which are whitelisted as export destinations. Export query fail if the
export destination does not match any of the configured prefixes. Example:
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
+| `druid.export.storage.s3.maxRetry` | No | Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
| 10 |
+| `druid.export.storage.s3.chunkSize` | No | Defines the size
of each chunk to temporarily store in `tempDir`. The chunk size must be between
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it
requires more disk space to store the temporary chunks. | 100MiB |
+
+##### LOCAL
+
+Exporting is also supported to the local storage, which exports the results to
the filesystem of the MSQ worker.
+This is useful in a single node setup or for testing, and is not suitable for
production use cases.
+
+This can be done by passing the function `LOCAL()` as an argument to the
`EXTERN FUNCTION`.
Review Comment:
```suggestion
Export results to local storage by passing the function `LOCAL()` as an
argument for the `EXTERN FUNCTION`. To use local storage as an export
destination, the runtime property `druid.export.storage.baseDir` must be
configured on the Indexer/Middle Manager.
The parameter provided to the `LOCAL()` function will be prefixed with this
value when exporting to a local destination.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]