ektravel commented on code in PR #17501:
URL: https://github.com/apache/druid/pull/17501#discussion_r1858914245


##########
docs/tutorials/tutorial-extern.md:
##########
@@ -0,0 +1,208 @@
+---
+id: tutorial-extern
+title: Export query results
+sidebar_label: Export results
+description: How to use EXTERN to export query results.
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This tutorial demonstrates how to use the 
[EXTERN](../multi-stage-query/reference.md#extern-function) function Apache 
Druid&circledR; to export data.
+
+## Prerequisites
+
+Before you follow the steps in this tutorial, download Druid as described in 
the [Local quickstart](index.md).
+Do not start Druid, you'll do that as part of the tutorial.
+
+You should be familiar with ingesting and querying data in Druid.
+If you haven't already, go through the [Query 
data](../tutorials/tutorial-query.md) tutorial first.
+
+## Export query results to the local file system
+
+This example demonstrates how to configure Druid to export to the local file 
system.
+It is OK to learn about EXTERN syntax for exporting data.
+It is not suitable for production scenarios.
+
+### Configure Druid local export directory 
+
+The following commands set the base path for the Druid exports to 
`/tmp/druid/`.
+If the account running Druid does not have access to `/tmp/druid/`, change the 
path.
+For example: `/Users/Example/druid`.
+If you change the path in this step, use the updated path in all subsequent 
steps.
+
+From the root of the Druid distribution, run the following:
+
+```bash
+export export_path="/tmp/druid"
+sed -i -e $'$a\\\n\\\n\\\n#\\\n###Local 
export\\\n#\\\ndruid.export.storage.baseDir='$export_path 
conf/druid/auto/_common/common.runtime.properties
+```
+
+This adds the following section to the Druid quicstart 
`common.runtime.properties`:
+
+```
+#
+###Local export
+#
+druid.export.storage.baseDir=/tmp/druid/
+```
+
+### Start Druid and load sample data
+
+From the root of the Druid distribution, launch Druid as follows:
+
+```bash
+./bin/start-druid
+```
+
+From the [Query view](http://localhost:8888/unified-console.html#workbench), 
run the following command to load the Wikipedia example data set:
+
+```sql
+REPLACE INTO "wikipedia" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      
'{"type":"http","uris":["https://druid.apache.org/data/wikipedia.json.gz"]}',
+      '{"type":"json"}'
+    )
+  ) EXTEND ("isRobot" VARCHAR, "channel" VARCHAR, "timestamp" VARCHAR, "flags" 
VARCHAR, "isUnpatrolled" VARCHAR, "page" VARCHAR, "diffUrl" VARCHAR, "added" 
BIGINT, "comment" VARCHAR, "commentLength" BIGINT, "isNew" VARCHAR, "isMinor" 
VARCHAR, "delta" BIGINT, "isAnonymous" VARCHAR, "user" VARCHAR, "deltaBucket" 
BIGINT, "deleted" BIGINT, "namespace" VARCHAR, "cityName" VARCHAR, 
"countryName" VARCHAR, "regionIsoCode" VARCHAR, "metroCode" BIGINT, 
"countryIsoCode" VARCHAR, "regionName" VARCHAR)
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "isRobot",
+  "channel",
+  "flags",
+  "isUnpatrolled",
+  "page",
+  "diffUrl",
+  "added",
+  "comment",
+  "commentLength",
+  "isNew",
+  "isMinor",
+  "delta",
+  "isAnonymous",
+  "user",
+  "deltaBucket",
+  "deleted",
+  "namespace",
+  "cityName",
+  "countryName",
+  "regionIsoCode",
+  "metroCode",
+  "countryIsoCode",
+  "regionName"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+### Query to export data
+
+Run the following query to export query results to the path:
+`/tmp/druid/wiki_example`.
+The path must be a subdirectory of the `druid.export.storage.baseDir`.
+
+
+```sql
+INSERT INTO
+  EXTERN(
+    local(exportPath => '/tmp/druid/wiki_example')
+        )
+AS CSV
+SELECT "channel",
+  SUM("delta") AS "changes"
+FROM "wikipedia"
+GROUP BY 1
+LIMIT 10
+```
+
+Druid exports the results of the qurey to the `/tmp/druid/wiki_example` 
dirctory.
+Run the following comannd to list the contents of 
+
+```bash
+ls '/tmp/druid/wiki_example'

Review Comment:
   ```suggestion
   ls /tmp/druid/wiki_example
   ```
   The command works without single quotes. If the quotes are optional, can we 
remove them?



##########
docs/multi-stage-query/reference.md:
##########
@@ -216,11 +223,11 @@ FROM <table>
 
 Supported arguments to the function:

Review Comment:
   ```suggestion
   Supported arguments for the function:
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -146,25 +151,26 @@ FROM <table>
 
 Supported arguments for the function:
 
-| Parameter   | Required | Description                                         
                                                                                
                                                                                
                                                           | Default |
-|-------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
-| `bucket`    | Yes      | The S3 bucket to which the files are exported to. 
The bucket and prefix combination should be whitelisted in 
`druid.export.storage.s3.allowedExportPaths`.                                   
                                                                                
  | n/a     |
-| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. The bucket and prefix combination should 
be whitelisted in `druid.export.storage.s3.allowedExportPaths`. | n/a     |
+| Parameter | Required | Description | Default |
+|---|---|---|---|
+| `bucket` | Yes  | S3 bucket destination for exported files. You must add the 
bucket and prefix combination to the 
`druid.export.storage.s3.allowedExportPaths`. | n/a |
+| `prefix` | Yes  | Destination path in the bucket to create exported files. 
The export query expects the destination path to be empty. If the location 
includes other files, the query will fail. You must add the bucket and prefix 
combination to the `druid.export.storage.s3.allowedExportPaths`. | n/a |
 
-The following runtime parameters must be configured to export into an S3 
destination:
+Configure following runtime parameters to export to an S3 destination:
 
-| Runtime Parameter                            | Required | Description        
                                                                                
                                                                                
                                                  | Default |
-|----------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
-| `druid.export.storage.s3.allowedExportPaths` | Yes      | An array of S3 
prefixes that are whitelisted as export destinations. Export queries fail if 
the export destination does not match any of the configured prefixes. Example: 
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]`    | n/a |
-| `druid.export.storage.s3.tempLocalDir`       | No       | Directory used on 
the local storage of the worker to store temporary files required while 
uploading the data. Uses the task temporary directory by default.               
                                                           | n/a |
-| `druid.export.storage.s3.maxRetry`           | No       | Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
                                                                                
                                                     | 10  |
-| `druid.export.storage.s3.chunkSize`          | No       | Defines the size 
of each chunk to temporarily store in `tempDir`. The chunk size must be between 
5 MiB and 5 GiB. A large chunk size reduces the API calls to S3, however it 
requires more disk space to store the temporary chunks. | 100MiB |
+| Runtime Parameter | Required | Description | Default |
+|---|---|---|---|
+| `druid.export.storage.s3.allowedExportPaths` | Yes | Array of S3 prefixes 
allowed as export destinations. Export queries fail if the export destination 
does not match any of the configured prefixes. For eample: 
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |

Review Comment:
   ```suggestion
   | `druid.export.storage.s3.allowedExportPaths` | Yes | Array of S3 prefixes 
allowed as export destinations. Export queries fail if the export destination 
does not match any of the configured prefixes. For example: 
`[\"s3://bucket1/export/\", \"s3://bucket2/export/\"]` | n/a |
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -179,29 +185,30 @@ FROM <table>
 
 Supported arguments for the function:
 
-| Parameter   | Required | Description                                         
                                                                                
                                                                                
                                                               | Default |
-|-------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
-| `bucket`    | Yes      | The GS bucket to which the files are exported to. 
The bucket and prefix combination should be whitelisted in 
`druid.export.storage.google.allowedExportPaths`.                               
                                                                                
      | n/a     |
-| `prefix`    | Yes      | Path where the exported files would be created. The 
export query expects the destination to be empty. If the location includes 
other files, then the query will fail. The bucket and prefix combination should 
be whitelisted in `druid.export.storage.google.allowedExportPaths`. | n/a     |
+| Parameter   | Required | Description | Default |
+|---|---|---|---|
+| `bucket`    | Yes | GCS bucket destination for exported files. You must add 
the bucket and prefix combination to the 
`druid.export.storage.google.allowedExportPaths` allow list. | n/a |
+| `prefix` | Yes  | Destination path in the bucket to create exported files. 
The export query expects the destination path to be empty. If the location 
includes other files, the query will fail. You must add the bucket and prefix 
combination to the `druid.export.storage.google.allowedExportPaths` allow list. 
| n/a |
+
+Configure the following runtime parameters to export query results to a GCS 
destination:
 
-The following runtime parameters must be configured to export into a GCS 
destination:
+| Runtime Parameter | Required | Description | Default |
+|---|---|---|---|
+| `druid.export.storage.google.allowedExportPaths` | Yes | Array of GCS 
prefixes allowed as export destinations. Export queries fail if the export 
destination does not match any of the configured prefixes. For eample: 
`[\"gs://bucket1/export/\", \"gs://bucket2/export/\"]` | n/a     |

Review Comment:
   ```suggestion
   | `druid.export.storage.google.allowedExportPaths` | Yes | Array of GCS 
prefixes allowed as export destinations. Export queries fail if the export 
destination does not match any of the configured prefixes. For example: 
`[\"gs://bucket1/export/\", \"gs://bucket2/export/\"]` | n/a     |
   ```



##########
docs/multi-stage-query/reference.md:
##########
@@ -216,11 +223,11 @@ FROM <table>
 
 Supported arguments to the function:
 
-| Parameter   | Required | Description                                         
                                                                                
                                                                                
                                     | Default |
-|-------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 --|
-| `exportPath`  | Yes | Absolute path to a subdirectory of 
`druid.export.storage.baseDir` used as the destination to export the results 
to. The export query expects the destination to be empty. If the location 
includes other files or directories, then the query will fail. | n/a |
+| Parameter | Required | Description | Default |
+|---|---|---|---|---|

Review Comment:
   ```suggestion
   |---|---|---|---|
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to