Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

via GitHub Thu, 01 Feb 2024 15:49:20 -0800


vogievetsky commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1475327849



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be 
exported.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED 
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any currently existing files at the specified 
directory, and creates new files with the results of the query.
+
+
+```sql
+REPLACE INTO
+  EXTERN(<destination function>)
+AS CSV
+OVERWRITE ALL
+SELECT
+  <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage. This can be done 
passing the function `S3()` as an argument to the `EXTERN` function. The 
`druid-s3-extensions` should be loaded.

Review Comment:
   I would like some more information on what this S3 function with the named 
is. Is it some special case or is it how we are settling on doing functions 
with named parameters. It it a SQL thing or a Calcite thing or a Druid thing?
   
   I have at one point seen functions with names parameters be represented as
   
   ```
   FN(x="a")
   FN(x='a')
   FN(x=>'a')
   ```
   
   Where are all these variations coming from? Can there be quotes on the keys? 
Are they `"` or `'`? Can these functions also accept non-named (ordinal) 
parameters?



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be 
exported.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED 
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any currently existing files at the specified 
directory, and creates new files with the results of the query.
+
+
+```sql
+REPLACE INTO
+  EXTERN(<destination function>)
+AS CSV
+OVERWRITE ALL
+SELECT
+  <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage. This can be done 
passing the function `S3()` as an argument to the `EXTERN` function. The 
`druid-s3-extensions` should be loaded.
+
+```sql
+INSERT INTO
+  EXTERN(S3(bucket=<...>, prefix=<...>, tempDir=<...>))

Review Comment:
   I think this example would be clearer if you used syntax that would actually 
parse like:
   
   `EXTERN(S3(bucket='s3://your_bucket, prefix='prefix/to/files', 
tempDir='/var'))`
   
   Otherwise it is very hard to understand what actually needs to go in there. 
I have read these docs and I still do not understand if the values are quoted 
with `'` or `"`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

Reply via email to