adarshsanjeev commented on code in PR #16051:
URL: https://github.com/apache/druid/pull/16051#discussion_r1522634321


##########
docs/multi-stage-query/reference.md:
##########
@@ -149,6 +149,39 @@ The following runtime parameters must be configured to 
export into an S3 destina
 | `druid.export.storage.s3.maxRetry`           | No       | Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
                                                                                
                                                     | 10  |
 | `druid.export.storage.s3.chunkSize`          | No       | Defines the size 
of each chunk to temporarily store in `tempDir`. The chunk size must be between 
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it 
requires more disk space to store the temporary chunks. | 100MiB |
 
+
+##### GS
+
+Export results to GCS by passing the function `google()` as an argument to the 
`EXTERN` function. Note that this requires the `druid-google-extensions`.
+The `google()` function is a Druid function that configures the connection. 
Arguments for `google()` should be passed as named parameters with the value in 
single quotes like the following example:
+
+```sql
+INSERT INTO
+  EXTERN(
+    google(bucket => 'your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments for the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                                                                
                                                               | Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The GS bucket to which the files are exported to. 
The bucket and prefix combination should be whitelisted in 
`druid.export.storage.google.allowedExportPaths`.                               
                                                                                
      | n/a     |
+| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. The bucket and prefix combination should 
be whitelisted in `druid.export.storage.google.allowedExportPaths`. | n/a     |
+
+The following runtime parameters must be configured to export into an S3 
destination:
+
+| Runtime Parameter                                | Required | Description    
                                                                                
                                                                                
                                                      | Default |
+|--------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.google.tempLocalDir`       | Yes      | Directory used 
on the local storage of the worker to store temporary files required while 
uploading the data.                                                             
                                                           | n/a |
+| `druid.export.storage.google.allowedExportPaths` | Yes      | An array of GS 
prefixes that are whitelisted as export destinations. Export queries fail if 
the export destination does not match any of the configured prefixes. Example: 
`[\"gs://bucket1/export/\", \"gs://bucket2/export/\"]`    | n/a |
+| `druid.export.storage.google.maxRetry`           | No       | Defines the 
max number times to attempt GS API calls to avoid failures due to transient 
errors.                                                                         
                                                             | 10  |
+| `druid.export.storage.gooel.chunkSize`           | No       | Defines the 
size of each chunk to temporarily store in `tempDir`. The chunk size must be 
between 5 MiB and 5 GiB. A large chunk size reduces the API calls to GS, 
however it requires more disk space to store the temporary chunks. | 100MiB |

Review Comment:
   The value of 100MiB as default has sometimes led to OOM issues. It should be 
lowered to 4MiB like 
org.apache.druid.storage.google.output.GoogleOutputConfig#DEFAULT_CHUNK_SIZE.



##########
extensions-core/google-extensions/src/main/java/org/apache/druid/storage/google/output/GoogleExportStorageProvider.java:
##########
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.storage.google.output;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.druid.data.input.impl.CloudObjectLocation;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.storage.ExportStorageProvider;
+import org.apache.druid.storage.StorageConnector;
+import org.apache.druid.storage.google.GoogleInputDataConfig;
+import org.apache.druid.storage.google.GoogleStorage;
+import org.apache.druid.storage.google.GoogleStorageDruidModule;
+
+import javax.validation.constraints.NotNull;
+import java.io.File;
+import java.net.URI;
+import java.util.List;
+
+@JsonTypeName(GoogleExportStorageProvider.TYPE_NAME)
+public class GoogleExportStorageProvider implements ExportStorageProvider
+{
+  public static final String TYPE_NAME = GoogleStorageDruidModule.SCHEME;

Review Comment:
   This should ideally point to 
org.apache.druid.data.input.google.GoogleCloudStorageInputSource#TYPE_KEY so 
that the name is the same for any permission configurations, even if the value 
is the same currently.



##########
docs/multi-stage-query/reference.md:
##########
@@ -149,6 +149,39 @@ The following runtime parameters must be configured to 
export into an S3 destina
 | `druid.export.storage.s3.maxRetry`           | No       | Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
                                                                                
                                                     | 10  |
 | `druid.export.storage.s3.chunkSize`          | No       | Defines the size 
of each chunk to temporarily store in `tempDir`. The chunk size must be between 
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it 
requires more disk space to store the temporary chunks. | 100MiB |
 
+
+##### GS
+
+Export results to GCS by passing the function `google()` as an argument to the 
`EXTERN` function. Note that this requires the `druid-google-extensions`.
+The `google()` function is a Druid function that configures the connection. 
Arguments for `google()` should be passed as named parameters with the value in 
single quotes like the following example:
+
+```sql
+INSERT INTO
+  EXTERN(
+    google(bucket => 'your_bucket', prefix => 'prefix/to/files')
+  )
+AS CSV
+SELECT
+  <column>
+FROM <table>
+```
+
+Supported arguments for the function:
+
+| Parameter   | Required | Description                                         
                                                                                
                                                                                
                                                               | Default |
+|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
+| `bucket`    | Yes      | The GS bucket to which the files are exported to. 
The bucket and prefix combination should be whitelisted in 
`druid.export.storage.google.allowedExportPaths`.                               
                                                                                
      | n/a     |
+| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. The bucket and prefix combination should 
be whitelisted in `druid.export.storage.google.allowedExportPaths`. | n/a     |
+
+The following runtime parameters must be configured to export into an S3 
destination:
+
+| Runtime Parameter                                | Required | Description    
                                                                                
                                                                                
                                                      | Default |
+|--------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
+| `druid.export.storage.google.tempLocalDir`       | Yes      | Directory used 
on the local storage of the worker to store temporary files required while 
uploading the data.                                                             
                                                           | n/a |
+| `druid.export.storage.google.allowedExportPaths` | Yes      | An array of GS 
prefixes that are whitelisted as export destinations. Export queries fail if 
the export destination does not match any of the configured prefixes. Example: 
`[\"gs://bucket1/export/\", \"gs://bucket2/export/\"]`    | n/a |
+| `druid.export.storage.google.maxRetry`           | No       | Defines the 
max number times to attempt GS API calls to avoid failures due to transient 
errors.                                                                         
                                                             | 10  |
+| `druid.export.storage.gooel.chunkSize`           | No       | Defines the 
size of each chunk to temporarily store in `tempDir`. The chunk size must be 
between 5 MiB and 5 GiB. A large chunk size reduces the API calls to GS, 
however it requires more disk space to store the temporary chunks. | 100MiB |

Review Comment:
   nit: `google`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to