LakshSingla commented on code in PR #15689: URL: https://github.com/apache/druid/pull/15689#discussion_r1469499794
########## docs/multi-stage-query/reference.md: ########## @@ -45,8 +45,11 @@ making it easy to reuse the same SQL statement for each ingest: just specify the ### `EXTERN` Function -Use the `EXTERN` function to read external data. The function has two variations. +Use the `EXTERN` function to read external data or write to an external source. Review Comment: When we are writing to an external location, it is not a source. Something like the following might be better. ```suggestion Use the `EXTERN` function to read external data or write to an external location. ``` ########## docs/multi-stage-query/reference.md: ########## @@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`. For more information, see [Read external data with EXTERN](concepts.md#read-external-data-with-extern). +#### `EXTERN` to export to a destination + +`EXTERN` can be used as a destination, which will export the data to the specified location and format. EXTERN when +used in this way accepts one argument. Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) +is not currently supported with export statements. + +INSERT statements and REPLACE statements are both supported with an `EXTERN` destination. The statements require an `AS` +clause that determines the format. +Currently, only `CSV` is supported as a format. Review Comment: ```suggestion Only `CSV` format is supported at the moment ``` ########## docs/multi-stage-query/reference.md: ########## @@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`. For more information, see [Read external data with EXTERN](concepts.md#read-external-data-with-extern). +#### `EXTERN` to export to a destination + +`EXTERN` can be used as a destination, which will export the data to the specified location and format. EXTERN when +used in this way accepts one argument. Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) +is not currently supported with export statements. + +INSERT statements and REPLACE statements are both supported with an `EXTERN` destination. The statements require an `AS` +clause that determines the format. +Currently, only `CSV` is supported as a format. + +Export statements support the context parameter `rowsPerPage` for the number of rows in each exported file. The default value +is 100,000. + +INSERT statements append the results to the existing files at the destination. +```sql +INSERT INTO + EXTERN(<destination function>) +AS CSV +SELECT + <column> +FROM <table> +``` + +REPLACE statements have an additional OVERWRITE clause. As partitioning is not yet supported, only `OVERWRITE ALL` +is allowed. REPLACE deletes any existing files at the destination and creates new files with the results of the query. Review Comment: "deletes any existing files" -> Does it clean up the directory, or just replace any files and paths conflicting with what the query generates the results for? ########## docs/multi-stage-query/concepts.md: ########## @@ -115,6 +115,17 @@ When deciding whether to use `REPLACE` or `INSERT`, keep in mind that segments g with dimension-based pruning but those generated with `INSERT` cannot. For more information about the requirements for dimension-based pruning, see [Clustering](#clustering). +### Write to an external destination with `EXTERN` + +Query tasks can write data to an external destination through the `EXTERN` function, when it is used with the `INTO` +clause, such as `REPLACE INTO EXTERN(...)` + +The EXTERN function takes arguments which specifies where to the files should be created. + +The format can be specified using an `AS` clause. + Review Comment: nit: Too much line break ########## docs/multi-stage-query/reference.md: ########## @@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`. For more information, see [Read external data with EXTERN](concepts.md#read-external-data-with-extern). +#### `EXTERN` to export to a destination + +`EXTERN` can be used as a destination, which will export the data to the specified location and format. EXTERN when Review Comment: ```suggestion `EXTERN` can be used to specify a destination, where the data needs to be exported. This variation of EXTERN requires one argument - (What the argument is) ``` minor nit: Also, from the docs, the format is specified via `AS` clause and not extern, therefore I have omitted that from the suggestion ########## docs/multi-stage-query/reference.md: ########## @@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`. For more information, see [Read external data with EXTERN](concepts.md#read-external-data-with-extern). +#### `EXTERN` to export to a destination + +`EXTERN` can be used as a destination, which will export the data to the specified location and format. EXTERN when +used in this way accepts one argument. Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) +is not currently supported with export statements. + +INSERT statements and REPLACE statements are both supported with an `EXTERN` destination. The statements require an `AS` +clause that determines the format. Review Comment: ```suggestion clause that specifies the format. ``` ########## docs/multi-stage-query/reference.md: ########## @@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`. For more information, see [Read external data with EXTERN](concepts.md#read-external-data-with-extern). +#### `EXTERN` to export to a destination + +`EXTERN` can be used as a destination, which will export the data to the specified location and format. EXTERN when +used in this way accepts one argument. Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) +is not currently supported with export statements. + +INSERT statements and REPLACE statements are both supported with an `EXTERN` destination. The statements require an `AS` +clause that determines the format. +Currently, only `CSV` is supported as a format. + +Export statements support the context parameter `rowsPerPage` for the number of rows in each exported file. The default value +is 100,000. + +INSERT statements append the results to the existing files at the destination. +```sql +INSERT INTO + EXTERN(<destination function>) +AS CSV +SELECT + <column> +FROM <table> +``` + +REPLACE statements have an additional OVERWRITE clause. As partitioning is not yet supported, only `OVERWRITE ALL` +is allowed. REPLACE deletes any existing files at the destination and creates new files with the results of the query. + +```sql +REPLACE INTO + EXTERN(<destination function>) +AS CSV +OVERWRITE ALL +SELECT + <column> +FROM <table> +``` + +Exporting is currently supported to Amazon S3 storage. The S3 extension is required to be loaded for this. +This can be done passing the function `S3()` as an argument to the `EXTERN` function. Review Comment: ```suggestion Exporting is currently supported for Amazon S3 storage. This can be done passing the function `S3()` as an argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
