Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

via GitHub Mon, 29 Jan 2024 04:40:19 -0800


LakshSingla commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1469499794



##########
docs/multi-stage-query/reference.md:
##########
@@ -45,8 +45,11 @@ making it easy to reuse the same SQL statement for each 
ingest: just specify the
 
 ### `EXTERN` Function
 
-Use the `EXTERN` function to read external data. The function has two 
variations.
+Use the `EXTERN` function to read external data or write to an external source.

Review Comment:
   When we are writing to an external location, it is not a source. Something 
like the following might be better. 
   ```suggestion
   Use the `EXTERN` function to read external data or write to an external 
location.
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used as a destination, which will export the data to the 
specified location and format. EXTERN when
+used in this way accepts one argument. Please note that partitioning 
(`PARTITIONED BY`) and clustering (`CLUSTERED BY`)
+is not currently supported with export statements.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination. The statements require an `AS`
+clause that determines the format.
+Currently, only `CSV` is supported as a format.

Review Comment:
   ```suggestion
   Only `CSV` format is supported at the moment
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used as a destination, which will export the data to the 
specified location and format. EXTERN when
+used in this way accepts one argument. Please note that partitioning 
(`PARTITIONED BY`) and clustering (`CLUSTERED BY`)
+is not currently supported with export statements.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination. The statements require an `AS`
+clause that determines the format.
+Currently, only `CSV` is supported as a format.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any existing files at the destination and creates 
new files with the results of the query.

Review Comment:
   "deletes any existing files" -> Does it clean up the directory, or just 
replace any files and paths conflicting with what the query generates the 
results for? 



##########
docs/multi-stage-query/concepts.md:
##########
@@ -115,6 +115,17 @@ When deciding whether to use `REPLACE` or `INSERT`, keep 
in mind that segments g
 with dimension-based pruning but those generated with `INSERT` cannot. For 
more information about the requirements
 for dimension-based pruning, see [Clustering](#clustering).
 
+### Write to an external destination with `EXTERN`
+
+Query tasks can write data to an external destination through the `EXTERN` 
function, when it is used with the `INTO`
+clause, such as `REPLACE INTO EXTERN(...)`
+
+The EXTERN function takes arguments which specifies where to the files should 
be created.
+
+The format can be specified using an `AS` clause.
+

Review Comment:
   nit: Too much line break



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used as a destination, which will export the data to the 
specified location and format. EXTERN when

Review Comment:
   ```suggestion
   `EXTERN` can be used to specify a destination, where the data needs to be 
exported.  This variation of EXTERN requires one argument - (What the argument 
is)
   ```
   minor nit: Also, from the docs, the format is specified via `AS` clause and 
not extern, therefore I have omitted that from the suggestion



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used as a destination, which will export the data to the 
specified location and format. EXTERN when
+used in this way accepts one argument. Please note that partitioning 
(`PARTITIONED BY`) and clustering (`CLUSTERED BY`)
+is not currently supported with export statements.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination. The statements require an `AS`
+clause that determines the format.

Review Comment:
   ```suggestion
   clause that specifies the format.
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used as a destination, which will export the data to the 
specified location and format. EXTERN when
+used in this way accepts one argument. Please note that partitioning 
(`PARTITIONED BY`) and clustering (`CLUSTERED BY`)
+is not currently supported with export statements.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination. The statements require an `AS`
+clause that determines the format.
+Currently, only `CSV` is supported as a format.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any existing files at the destination and creates 
new files with the results of the query.
+
+```sql
+REPLACE INTO
+  EXTERN(<destination function>)
+AS CSV
+OVERWRITE ALL
+SELECT
+  <column>
+FROM <table>
+```
+
+Exporting is currently supported to Amazon S3 storage. The S3 extension is 
required to be loaded for this.
+This can be done passing the function `S3()` as an argument to the `EXTERN` 
function.

Review Comment:
   ```suggestion
   Exporting is currently supported for Amazon S3 storage. This can be done 
passing the function `S3()` as an argument to the `EXTERN` function. The 
`druid-s3-extensions` should be loaded.
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

Reply via email to