Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

via GitHub Wed, 07 Feb 2024 03:34:20 -0800


cryptoe commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1481301885



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,89 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination where you want to export data to.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+Keep the following in mind when using EXTERN to export rows:
+- Only INSERT statements are supported.
+- Only `CSV` format is supported as an export format.
+- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't 
supported with export statements.
+- You can export to Amazon S3 or local storage.
+- The destination provided should contain no other files or directories.
+
+When you export data, use the `rowsPerPage` context parameter to control how 
many rows get exported. The default is 100,000.
+
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+##### S3
+
+Export results to S3 by passing the function `S3()` as an argument to the 
`EXTERN` function. Note that this requires the `druid-s3-extensions`.
+The `S3()` function is a Druid function that configures the connection. 
Arguments for `S3()` should be passed as named parameters with the value in 
single quotes like the following example:
+
+```sql
+INSERT INTO
+  EXTERN(
+    S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments for the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                  | Default |
+|-------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The S3 bucket to which the files are exported to.   
                                                                                
                                  | n/a     |
+| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. | n/a     |

Review Comment:
   We should mention that the bucket and prefix should match whatever the 
cluster admin has provided. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1877,6 +1876,42 @@ private static QueryDefinition makeQueryDefinition(
       } else {
         return queryDef;
       }
+    } else if (querySpec.getDestination() instanceof ExportMSQDestination) {
+      final ExportMSQDestination exportMSQDestination = (ExportMSQDestination) 
querySpec.getDestination();
+      final ExportStorageProvider exportStorageProvider = 
exportMSQDestination.getExportStorageProvider();
+
+      try {
+        // Check that the export destination is empty as a sanity check. We 
want to avoid modifying any other files with export.
+        Iterator<String> filesIterator = 
exportStorageProvider.get().listDir("");
+        if (filesIterator.hasNext()) {
+          throw DruidException.forPersona(DruidException.Persona.USER)
+                              
.ofCategory(DruidException.Category.RUNTIME_FAILURE)
+                              .build("Found files at provided export 
destination. Export is only allowed to "

Review Comment:
   Also mention that you can also append a subdir. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

Reply via email to