Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

via GitHub Wed, 07 Feb 2024 03:33:51 -0800


cryptoe commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1480041048



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,93 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be 
exported.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+Only INSERT statements are supported with an `EXTERN` destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED 
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Exporting is currently supported for Amazon S3 storage and local storage.
+
+##### S3
+
+Exporting results to S3 can be done by passing the function `S3()` as an 
argument to the `EXTERN` function. The `druid-s3-extensions` should be loaded.
+The `S3()` function is a druid function which configures the connection. 
Arguments to `S3()` should be passed as named parameters with the value in 
single quotes like the example below.
+
+```sql
+INSERT INTO
+  EXTERN(
+    S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments to the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                                                                
               | Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The S3 bucket to which the files are exported to.   
                                                                                
                                                                                
               | n/a     |

Review Comment:
   We donot want to expose bucket here. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/destination/ExportMSQDestination.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.indexing.destination;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.msq.querykit.ShuffleSpecFactories;
+import org.apache.druid.msq.querykit.ShuffleSpecFactory;
+import org.apache.druid.server.security.Resource;
+import org.apache.druid.server.security.ResourceType;
+import org.apache.druid.sql.http.ResultFormat;
+import org.apache.druid.storage.ExportStorageProvider;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Destination used by tasks that write the results as files to an external 
destination. {@link #resultFormat} denotes
+ * the format of the file created and {@link #exportStorageProvider} denotes 
the type of external
+ * destination.
+ * <br>
+ * {@link #replaceTimeChunks} denotes how existing files should be handled.
+ * - If the value is null, the results are appended to the existing files.
+ * - If the value is present, existing files will be deleted according to time 
intervals.
+ */
+public class ExportMSQDestination implements MSQDestination
+{
+  public static final String TYPE = "export";
+  private final ExportStorageProvider exportStorageProvider;
+  private final ResultFormat resultFormat;
+  @Nullable
+  private final List<Interval> replaceTimeChunks;

Review Comment:
   Why is it called replace timeChunks ?



##########
docs/multi-stage-query/concepts.md:
##########
@@ -115,6 +115,14 @@ When deciding whether to use `REPLACE` or `INSERT`, keep 
in mind that segments g
 with dimension-based pruning but those generated with `INSERT` cannot. For 
more information about the requirements
 for dimension-based pruning, see [Clustering](#clustering).
 
+### Write to an external destination with `EXTERN`
+
+Query tasks can write data to an external destination through the `EXTERN` 
function, when it is used with the `INTO`
+clause, such as `REPLACE INTO EXTERN(...)` The EXTERN function takes arguments 
which specifies where to the files should be created.

Review Comment:
   ```suggestion
   clause, such as `INSERT INTO EXTERN(...)` The EXTERN function takes 
arguments which specifies where to the files should be created.
   ```



##########
processing/src/main/java/org/apache/druid/storage/local/LocalFileExportStorageProvider.java:
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.local;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.druid.data.input.impl.LocalInputSource;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.storage.ExportStorageProvider;
+import org.apache.druid.storage.StorageConfig;
+import org.apache.druid.storage.StorageConnector;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Objects;
+
+@JsonTypeName(LocalFileExportStorageProvider.TYPE_NAME)
+public class LocalFileExportStorageProvider implements ExportStorageProvider
+{
+  public static final String TYPE_NAME = LocalInputSource.TYPE_KEY;
+
+  @JacksonInject
+  StorageConfig storageConfig;
+
+  @JsonProperty
+  private final String exportPath;
+
+  @JsonCreator
+  public LocalFileExportStorageProvider(@JsonProperty(value = "exportPath", 
required = true) String exportPath)
+  {
+    this.exportPath = exportPath;
+  }
+
+  @Override
+  public StorageConnector get()
+  {
+    final File exportDestination = 
validateAndGetPath(storageConfig.getBaseDir(), exportPath);
+    try {
+      return new LocalFileStorageConnector(exportDestination);
+    }
+    catch (IOException e) {
+      throw new IAE(
+          e,
+          "Unable to create storage connector [%s] for base path [%s]",
+          LocalFileStorageConnector.class.getSimpleName(),
+          exportDestination.toPath()
+      );
+    }
+  }
+
+  @Override
+  @JsonIgnore
+  public String getResourceType()
+  {
+    return TYPE_NAME;
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    LocalFileExportStorageProvider that = (LocalFileExportStorageProvider) o;
+    return Objects.equals(exportPath, that.exportPath);
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(exportPath);
+  }
+
+  @Override
+  public String toString()
+  {
+    return "LocalFileExportStorageProvider{" +
+           "exportPath=" + exportPath +
+           '}';
+  }
+
+  public static File validateAndGetPath(String basePath, String customPath)
+  {
+    if (basePath == null) {
+      throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+                          .ofCategory(DruidException.Category.NOT_FOUND)
+                          .build(
+                              "The runtime property 
`druid.export.storage.baseDir` must be configured for local export.");

Review Comment:
   Same error message changes here. 



##########
docs/multi-stage-query/concepts.md:
##########
@@ -115,6 +115,14 @@ When deciding whether to use `REPLACE` or `INSERT`, keep 
in mind that segments g
 with dimension-based pruning but those generated with `INSERT` cannot. For 
more information about the requirements
 for dimension-based pruning, see [Clustering](#clustering).
 
+### Write to an external destination with `EXTERN`
+
+Query tasks can write data to an external destination through the `EXTERN` 
function, when it is used with the `INTO`
+clause, such as `REPLACE INTO EXTERN(...)`. The EXTERN function takes 
arguments that specify where to write the files.

Review Comment:
   ```suggestion
   clause, such as `INSERT INTO EXTERN(...)`. The EXTERN function takes 
arguments that specify where to write the files.
   ```



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java:
##########
@@ -203,7 +209,22 @@ public QueryResponse<Object[]> runQuery(final DruidQuery 
druidQuery)
 
     final MSQDestination destination;
 
-    if (targetDataSource != null) {
+    if (targetDataSource instanceof ExportDestination) {
+      ExportDestination exportDestination = ((ExportDestination) 
targetDataSource);
+      ResultFormat format = 
ResultFormat.fromString(sqlQueryContext.getString(DruidSqlIngest.SQL_EXPORT_FILE_FORMAT));
+
+      if (replaceTimeChunks != null && 
!Intervals.ONLY_ETERNITY.equals(replaceTimeChunks)) {

Review Comment:
   This should not be needed. Lets remove it. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/destination/ExportMSQDestination.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.indexing.destination;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.msq.querykit.ShuffleSpecFactories;
+import org.apache.druid.msq.querykit.ShuffleSpecFactory;
+import org.apache.druid.server.security.Resource;
+import org.apache.druid.server.security.ResourceType;
+import org.apache.druid.sql.http.ResultFormat;
+import org.apache.druid.storage.ExportStorageProvider;
+import org.joda.time.Interval;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Destination used by tasks that write the results as files to an external 
destination. {@link #resultFormat} denotes
+ * the format of the file created and {@link #exportStorageProvider} denotes 
the type of external
+ * destination.
+ * <br>
+ * {@link #replaceTimeChunks} denotes how existing files should be handled.
+ * - If the value is null, the results are appended to the existing files.
+ * - If the value is present, existing files will be deleted according to time 
intervals.
+ */
+public class ExportMSQDestination implements MSQDestination
+{
+  public static final String TYPE = "export";
+  private final ExportStorageProvider exportStorageProvider;
+  private final ResultFormat resultFormat;

Review Comment:
   Where would parquet come  in future ?



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1772,16 +1777,10 @@ private static QueryDefinition makeQueryDefinition(
       } else {
         queryToPlan = querySpec.getQuery();
       }
-    } else if (querySpec.getDestination() instanceof TaskReportMSQDestination) 
{
-      shuffleSpecFactory = ShuffleSpecFactories.singlePartition();
-      queryToPlan = querySpec.getQuery();
-    } else if (querySpec.getDestination() instanceof 
DurableStorageMSQDestination) {
-      shuffleSpecFactory = ShuffleSpecFactories.getGlobalSortWithTargetSize(
-          MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context())
-      );
-      queryToPlan = querySpec.getQuery();
     } else {
-      throw new ISE("Unsupported destination [%s]", 
querySpec.getDestination());
+      shuffleSpecFactory = querySpec.getDestination()
+                                    
.getShuffleSpecFactory(MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context()));

Review Comment:
   Thanks for the refactor. Its much cleaner now. 
   We should add a comment saying all select partitions are controlled by a 
context value rowsPerPage.



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,89 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination where you want to export data to.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+Keep the following in mind when using EXTERN to export rows:
+- Only INSERT statements are supported.
+- Only `CSV` format is supported as an export format.
+- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't 
supported with export statements.
+- You can export to Amazon S3 or local storage.
+- The destination provided should contain no other files or directories.
+
+When you export data, use the `rowsPerPage` context parameter to control how 
many rows get exported. The default is 100,000.
+
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+##### S3
+
+Export results to S3 by passing the function `S3()` as an argument to the 
`EXTERN` function. Note that this requires the `druid-s3-extensions`.
+The `S3()` function is a Druid function that configures the connection. 
Arguments for `S3()` should be passed as named parameters with the value in 
single quotes like the following example:
+
+```sql
+INSERT INTO
+  EXTERN(
+    S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments for the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                  | Default |
+|-------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The S3 bucket to which the files are exported to.   
                                                                                
                                  | n/a     |
+| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. | n/a     |

Review Comment:
   Why should mention that the bucket and prefix should match whatever the 
cluster admin has provided. 



##########
extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3ExportStorageProvider.java:
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.s3.output;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.data.input.s3.S3InputSource;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.storage.ExportStorageProvider;
+import org.apache.druid.storage.StorageConnector;
+import org.apache.druid.storage.s3.S3StorageDruidModule;
+import org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3;
+
+import javax.validation.constraints.NotNull;
+import java.io.File;
+import java.net.URI;
+import java.util.List;
+
+@JsonTypeName(S3ExportStorageProvider.TYPE_NAME)
+public class S3ExportStorageProvider implements ExportStorageProvider
+{
+  public static final String TYPE_NAME = S3InputSource.TYPE_KEY;
+  @JsonProperty
+  private final String bucket;
+  @JsonProperty
+  private final String prefix;
+
+  @JacksonInject
+  S3ExportConfig s3ExportConfig;
+  @JacksonInject
+  ServerSideEncryptingAmazonS3 s3;
+
+  @JsonCreator
+  public S3ExportStorageProvider(
+      @JsonProperty(value = "bucket", required = true) String bucket,
+      @JsonProperty(value = "prefix", required = true) String prefix
+  )
+  {
+    this.bucket = bucket;
+    this.prefix = prefix;
+  }
+
+  @Override
+  public StorageConnector get()
+  {
+    final String tempDir = s3ExportConfig.getTempDir();
+    if (tempDir == null) {
+      throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+                          .ofCategory(DruidException.Category.NOT_FOUND)
+                          .build("The runtime property 
`druid.export.storage.s3.tempDir` must be configured for S3 export.");
+    }
+    final List<String> allowedExportPaths = 
s3ExportConfig.getAllowedExportPaths();
+    if (allowedExportPaths == null) {
+      throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+                          .ofCategory(DruidException.Category.NOT_FOUND)
+                          .build(
+                              "The runtime property 
`druid.export.storage.s3.allowedExportPaths` must be configured for S3 
export.");
+    }
+    validateS3Prefix(allowedExportPaths, bucket, prefix);
+    final S3OutputConfig s3OutputConfig = new S3OutputConfig(
+        bucket,
+        prefix,
+        new File(tempDir),
+        s3ExportConfig.getChunkSize(),
+        s3ExportConfig.getMaxRetry()
+    );
+    return new S3StorageConnector(s3OutputConfig, s3);
+  }
+
+  @VisibleForTesting
+  static void validateS3Prefix(@NotNull final List<String> allowedExportPaths, 
final String bucket, final String prefix)
+  {
+    final URI providedUri = new CloudObjectLocation(bucket, 
prefix).toUri(S3StorageDruidModule.SCHEME);
+    for (final String path : allowedExportPaths) {
+      final URI allowedUri = URI.create(path);
+      if (validateUri(allowedUri, providedUri)) {
+        return;
+      }
+    }
+    throw DruidException.forPersona(DruidException.Persona.USER)

Review Comment:
   This error is user facing error. Please mention that the user should reach 
out to the cluster admin for the paths for export.  The paths are controlled 
via xxx property 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1872,6 +1871,44 @@ private static QueryDefinition makeQueryDefinition(
       } else {
         return queryDef;
       }
+    } else if (querySpec.getDestination() instanceof ExportMSQDestination) {
+      final ExportMSQDestination exportMSQDestination = (ExportMSQDestination) 
querySpec.getDestination();
+      final StorageConnectorProvider storageConnectorProvider = 
exportMSQDestination.getStorageConnectorProvider();
+
+      final ResultFormat resultFormat = exportMSQDestination.getResultFormat();
+
+      // If the statement is a 'REPLACE' statement, delete the existing files 
at the destination.
+      if (exportMSQDestination.getReplaceTimeChunks() != null) {
+        if 
(Intervals.ONLY_ETERNITY.equals(exportMSQDestination.getReplaceTimeChunks())) {
+          StorageConnector storageConnector = storageConnectorProvider.get();
+          try {
+            storageConnector.deleteRecursively("");

Review Comment:
   Also I think code flow wise, make query definition may not be the correct 
place to delete the file. 
   Maybe it can be done after we create the query definition object. (Clear 
files if needed)



##########
docs/multi-stage-query/reference.md:
##########
@@ -90,6 +93,89 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+#### `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination where you want to export data to.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+Keep the following in mind when using EXTERN to export rows:
+- Only INSERT statements are supported.
+- Only `CSV` format is supported as an export format.
+- Partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED BY`) aren't 
supported with export statements.
+- You can export to Amazon S3 or local storage.
+- The destination provided should contain no other files or directories.
+
+When you export data, use the `rowsPerPage` context parameter to control how 
many rows get exported. The default is 100,000.
+
+```sql
+INSERT INTO
+  EXTERN(<destination function>)
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+##### S3
+
+Export results to S3 by passing the function `S3()` as an argument to the 
`EXTERN` function. Note that this requires the `druid-s3-extensions`.
+The `S3()` function is a Druid function that configures the connection. 
Arguments for `S3()` should be passed as named parameters with the value in 
single quotes like the following example:
+
+```sql
+INSERT INTO
+  EXTERN(
+    S3(bucket => 's3://your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments for the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                  | Default |
+|-------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The S3 bucket to which the files are exported to.   
                                                                                
                                  | n/a     |
+| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. | n/a     |
+
+The following runtime parameters must be configured to export into an S3 
destination:
+
+| Runtime Parameter                            | Required | Description        
                                                                                
                                                                                
                                                  | Default |
+|----------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.s3.tempSubDir`         | Yes      | Directory used to 
store temporary files required while uploading the data.                        
                                                                                
                                                   | n/a |

Review Comment:
   These are local directories rite. I think the property name should mention 
that . s3.tempLocalDir



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1877,6 +1876,42 @@ private static QueryDefinition makeQueryDefinition(
       } else {
         return queryDef;
       }
+    } else if (querySpec.getDestination() instanceof ExportMSQDestination) {
+      final ExportMSQDestination exportMSQDestination = (ExportMSQDestination) 
querySpec.getDestination();
+      final ExportStorageProvider exportStorageProvider = 
exportMSQDestination.getExportStorageProvider();
+
+      try {
+        // Check that the export destination is empty as a sanity check. We 
want to avoid modifying any other files with export.
+        Iterator<String> filesIterator = 
exportStorageProvider.get().listDir("");
+        if (filesIterator.hasNext()) {
+          throw DruidException.forPersona(DruidException.Persona.USER)
+                              
.ofCategory(DruidException.Category.RUNTIME_FAILURE)
+                              .build("Found files at provided export 
destination. Export is only allowed to "

Review Comment:
   Lets put the path location in the error message. 



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1877,6 +1876,42 @@ private static QueryDefinition makeQueryDefinition(
       } else {
         return queryDef;
       }
+    } else if (querySpec.getDestination() instanceof ExportMSQDestination) {
+      final ExportMSQDestination exportMSQDestination = (ExportMSQDestination) 
querySpec.getDestination();
+      final ExportStorageProvider exportStorageProvider = 
exportMSQDestination.getExportStorageProvider();
+
+      try {
+        // Check that the export destination is empty as a sanity check. We 
want to avoid modifying any other files with export.
+        Iterator<String> filesIterator = 
exportStorageProvider.get().listDir("");
+        if (filesIterator.hasNext()) {
+          throw DruidException.forPersona(DruidException.Persona.USER)
+                              
.ofCategory(DruidException.Category.RUNTIME_FAILURE)
+                              .build("Found files at provided export 
destination. Export is only allowed to "

Review Comment:
   Also mention that you can also append a empty subdir. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

Reply via email to