morningman commented on a change in pull request #7497:
URL: https://github.com/apache/incubator-doris/pull/7497#discussion_r775819294
##########
File path: docs/zh-CN/administrator-guide/outfile.md
##########
@@ -138,167 +138,8 @@ explain select xxx from xxx where xxx into outfile
"s3://xxx" format as csv pro
## 使用示例
-1. 示例1
+具体参阅[OUTFILE 文档](../sql-reference/sql-statements/Data
Manipulation/OUTFILE.html)。
Review comment:
```suggestion
具体参阅[OUTFILE
文档](../sql-reference/sql-statements/Data%20Manipulation/OUTFILE.md)。
```
##########
File path: docs/en/administrator-guide/outfile.md
##########
@@ -139,166 +139,7 @@ Planning example for concurrent export:
## Usage example
-1. Example 1
-
- Export simple query results to the file `hdfs://path/to/result.txt`.
Specify the export format as CSV. Use `my_broker` and set kerberos
authentication information. Specify the column separator as `,` and the line
delimiter as `\n`.
-
- ```
- SELECT * FROM tbl
- INTO OUTFILE "hdfs://path/to/result_"
- FORMAT AS CSV
- PROPERTIES
- (
- "broker.name" = "my_broker",
- "broker.hadoop.security.authentication" = "kerberos",
- "broker.kerberos_principal" = "[email protected]",
- "broker.kerberos_keytab" = "/home/doris/my.keytab",
- "column_separator" = ",",
- "line_delimiter" = "\n",
- "max_file_size" = "100MB"
- );
- ```
-
- If the result is less than 100MB, file will be: `result_0.csv`.
-
- If larger than 100MB, may be: `result_0.csv, result_1.csv, ...`.
-
-2. Example 2
-
- Export simple query results to the file `hdfs://path/to/result.parquet`.
Specify the export format as PARQUET. Use `my_broker` and set kerberos
authentication information.
-
- ```
- SELECT c1, c2, c3 FROM tbl
- INTO OUTFILE "hdfs://path/to/result_"
- FORMAT AS PARQUET
- PROPERTIES
- (
- "broker.name" = "my_broker",
- "broker.hadoop.security.authentication" = "kerberos",
- "broker.kerberos_principal" = "[email protected]",
- "broker.kerberos_keytab" = "/home/doris/my.keytab",
-
"schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
- );
- ```
-
- If the exported file format is PARQUET, `schema` must be specified.
-
-3. Example 3
-
- Export the query result of the CTE statement to the file
`hdfs://path/to/result.txt`. The default export format is CSV. Use `my_broker`
and set hdfs high availability information. Use the default column separators
and line delimiter.
-
- ```
- WITH
- x1 AS
- (SELECT k1, k2 FROM tbl1),
- x2 AS
- (SELECT k3 FROM tbl2)
- SELEC k1 FROM x1 UNION SELECT k3 FROM x2
- INTO OUTFILE "hdfs://path/to/result_"
- PROPERTIES
- (
- "broker.name" = "my_broker",
- "broker.username"="user",
- "broker.password"="passwd",
- "broker.dfs.nameservices" = "my_ha",
- "broker.dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
- "broker.dfs.namenode.rpc-address.my_ha.my_namenode1" =
"nn1_host:rpc_port",
- "broker.dfs.namenode.rpc-address.my_ha.my_namenode2" =
"nn2_host:rpc_port",
- "broker.dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
- );
- ```
-
- If the result is less than 1GB, file will be: `result_0.csv`.
-
- If larger than 1GB, may be: `result_0.csv, result_1.csv, ...`.
-
-4. Example 4
-
- Export the query results of the UNION statement to the file
`bos://bucket/result.parquet`. Specify the export format as PARQUET. Use
`my_broker` and set hdfs high availability information. PARQUET format does not
need to specify the column separator and line delimiter.
-
- ```
- SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
- INTO OUTFILE "bos://bucket/result_"
- FORMAT AS PARQUET
- PROPERTIES
- (
- "broker.name" = "my_broker",
- "broker.bos_endpoint" = "http://bj.bcebos.com",
- "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
- "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
- "schema"="required,int32,k1;required,byte_array,k2"
- );
- ```
-
-5. Example 5
-
- Export simple query results to the file
`cos://${bucket_name}/path/result.txt`. Specify the export format as CSV.
- And create a mark file after export finished.
-
- ```
- select k1,k2,v1 from tbl1 limit 100000
- into outfile "s3a://my_bucket/export/my_file_"
- FORMAT AS CSV
- PROPERTIES
- (
- "broker.name" = "hdfs_broker",
- "broker.fs.s3a.access.key" = "xxx",
- "broker.fs.s3a.secret.key" = "xxxx",
- "broker.fs.s3a.endpoint" = "https://cos.xxxxxx.myqcloud.com/",
- "column_separator" = ",",
- "line_delimiter" = "\n",
- "max_file_size" = "1024MB",
- "success_file_name" = "SUCCESS"
- )
- ```
-
- If the result is less than 1GB, file will be: `my_file_0.csv`.
-
- If larger than 1GB, may be: `my_file_0.csv, result_1.csv, ...`.
-
- Please Note:
- 1. Paths that do not exist are automatically created.
- 2. These parameters(access.key/secret.key/endpointneed) need to be
confirmed with `Tecent Cloud COS`. In particular, the value of endpoint does
not need to be filled in bucket_name.
-
-6. Example 6
-
- Use the s3 protocol to export to bos, and concurrent export is enabled.
-
- ```
- set enable_parallel_outfile = true;
- select k1 from tb1 limit 1000
- into outfile "s3://my_bucket/export/my_file_"
- format as csv
- properties
- (
- "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
- "AWS_ACCESS_KEY" = "xxxx",
- "AWS_SECRET_KEY" = "xxx",
- "AWS_REGION" = "bd"
- )
- ```
-
- The final generated file prefix is `my_file_{fragment_instance_id}_`。
-
-7. Example 7
-
- Use the s3 protocol to export to bos, and enable concurrent export of
session variables.
-
- ```
- set enable_parallel_outfile = true;
- select k1 from tb1 order by k1 limit 1000
- into outfile "s3://my_bucket/export/my_file_"
- format as csv
- properties
- (
- "AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
- "AWS_ACCESS_KEY" = "xxxx",
- "AWS_SECRET_KEY" = "xxx",
- "AWS_REGION" = "bd"
- )
- ```
-
- **But because the query statement has a top-level sorting node, even if
the query is enabled for concurrently exported session variables, it cannot be
exported concurrently.**
+For details, please refer to [OUTFILE
Document](../sql-reference/sql-statements/Data Manipulation/OUTFILE.html).
Review comment:
```suggestion
For details, please refer to [OUTFILE
Document](../sql-reference/sql-statements/Data%20Manipulation/OUTFILE.md).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]