[incubator-seatunnel] branch dev updated: optimize file sink doc (#2363)

kirs Thu, 04 Aug 2022 00:08:23 -0700

This is an automated email from the ASF dual-hosted git repository.

kirs pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new 04b586f2b optimize file sink doc (#2363)
04b586f2b is described below

commit 04b586f2b81648ac9cc88b70a09f979280555606
Author: Eric <[email protected]>
AuthorDate: Thu Aug 4 15:08:13 2022 +0800

    optimize file sink doc (#2363)
---
 docs/en/connector-v2/sink/File.mdx     | 266 ---------------------------------
 docs/en/connector-v2/sink/HdfsFile.md  | 141 +++++++++++++++++
 docs/en/connector-v2/sink/LocalFile.md | 139 +++++++++++++++++
 3 files changed, 280 insertions(+), 266 deletions(-)

diff --git a/docs/en/connector-v2/sink/File.mdx 
b/docs/en/connector-v2/sink/File.mdx
deleted file mode 100644
index 6497ef554..000000000
--- a/docs/en/connector-v2/sink/File.mdx
+++ /dev/null
@@ -1,266 +0,0 @@
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
-# File
-
-## Description
-
-Output data to local or hdfs or s3 file.
-
-## Options
-
-<Tabs
-    groupId="engine-type"
-    defaultValue="LocalFile"
-    values={[
-        {label: 'LocalFile', value: 'LocalFile'},
-        {label: 'HdfsFile', value: 'HdfsFile'},
-    ]}>
-    <TabItem value="LocalFile">
-
-| name                              | type   | required | default value        
                                         |
-| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
-| path                              | string | yes      | -                    
                                         |
-| file_name_expression              | string | no       | "${transactionId}"   
                                         |
-| file_format                       | string | no       | "text"               
                                         |
-| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
-| field_delimiter                   | string | no       | '\001'               
                                         |
-| row_delimiter                     | string | no       | "\n"                 
                                         |
-| partition_by                      | array  | no       | -                    
                                         |
-| partition_dir_expression          | string | no       | 
"\${k0}=\${v0}\/\${k1}=\${v1}\/...\/\${kn}=\${vn}\/"          |
-| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
-| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
-| is_enable_transaction             | boolean| no       | true                 
                                         |
-| save_mode                         | string | no       | "error"              
                                         |
-
-### path [string]
-
-The target dir path is required. The `hdfs file` starts with `hdfs://` , and 
the `local file` starts with `file://`,
-
-### file_name_expression [string]
-
-`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
-`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
-
-### file_format [string]
-
-We supported as the following file types:
-
-`text` `csv` `parquet`
-
-Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
-
-### filename_time_format [string]
-
-When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
-
-| Symbol | Description        |
-| ------ | ------------------ |
-| y      | Year               |
-| M      | Month              |
-| d      | Day of month       |
-| H      | Hour in day (0-23) |
-| m      | Minute in hour     |
-| s      | Second in minute   |
-
-See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
-
-### field_delimiter [string]
-
-The separator between columns in a row of data.
-
-### row_delimiter [string]
-
-The separator between rows in a file.
-
-### partition_by [array]
-
-Partition data based on selected fields
-
-### partition_dir_expression [string]
-
-If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
-
-Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
-
-### is_partition_field_write_in_file [boolean]
-
-If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
-
-For example, if you want to write a Hive Data File, Its value should be 
`false`.
-
-### sink_columns [array]
-
-Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
-The order of the fields determines the order in which the file is actually 
written.
-
-### is_enable_transaction [boolean]
-
-If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
-
-Only support `true` now.
-
-### save_mode [string]
-
-Storage mode, currently supports `overwrite` , `append` , `ignore` and `error` 
. For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
-
-Streaming Job not support `overwrite`.
-
-</TabItem>
-<TabItem value="HdfsFile">
-
-In order to use this connector, You must ensure your spark/flink cluster 
already integrated hadoop. The tested hadoop version is 2.x.
-
-| name                              | type   | required | default value        
                                         |
-| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
-| path                              | string | yes      | -                    
                                         |
-| file_name_expression              | string | no       | "${transactionId}"   
                                         |
-| file_format                       | string | no       | "text"               
                                         |
-| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
-| field_delimiter                   | string | no       | '\001'               
                                         |
-| row_delimiter                     | string | no       | "\n"                 
                                         |
-| partition_by                      | array  | no       | -                    
                                         |
-| partition_dir_expression          | string | no       | 
"\${k0}=\${v0}\/\${k1}=\${v1}\/...\/\${kn}=\${vn}\/"          |
-| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
-| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
-| is_enable_transaction             | boolean| no       | true                 
                                         |
-| save_mode                         | string | no       | "error"              
                                         |
-
-### path [string]
-
-The target dir path is required. The `hdfs file` starts with `hdfs://` , and 
the `local file` starts with `file://`,
-
-### file_name_expression [string]
-
-`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
-`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
-
-### file_format [string]
-
-We supported as the following file types:
-
-`text` `csv` `parquet`
-
-Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
-
-### filename_time_format [string]
-
-When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
-
-| Symbol | Description        |
-| ------ | ------------------ |
-| y      | Year               |
-| M      | Month              |
-| d      | Day of month       |
-| H      | Hour in day (0-23) |
-| m      | Minute in hour     |
-| s      | Second in minute   |
-
-See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
-
-### field_delimiter [string]
-
-The separator between columns in a row of data.
-
-### row_delimiter [string]
-
-The separator between rows in a file.
-
-### partition_by [array]
-
-Partition data based on selected fields
-
-### partition_dir_expression [string]
-
-If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
-
-Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
-
-### is_partition_field_write_in_file [boolean]
-
-If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
-
-For example, if you want to write a Hive Data File, Its value should be 
`false`.
-
-### sink_columns [array]
-
-Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
-The order of the fields determines the order in which the file is actually 
written.
-
-### is_enable_transaction [boolean]
-
-If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
-
-Only support `true` now.
-
-### save_mode [string]
-
-Storage mode, currently supports `overwrite` , `append` , `ignore` and `error` 
. For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
-
-Streaming Job not support `overwrite`.
-</TabItem>
-</Tabs>
-
-## Example
-
-<Tabs
-    groupId="engine-type"
-    defaultValue="LocalFile"
-    values={[
-        {label: 'LocalFile', value: 'LocalFile'},
-        {label: 'HdfsFile', value: 'HdfsFile'},
-    ]}>
-<TabItem value="LocalFile">
-
-```bash
-
-LocalFile {
-    path="file:///tmp/hive/warehouse/test2"
-    field_delimiter="\t"
-    row_delimiter="\n"
-    partition_by=["age"]
-    partition_dir_expression="${k0}=${v0}"
-    is_partition_field_write_in_file=true
-    file_name_expression="${transactionId}_${now}"
-    file_format="text"
-    sink_columns=["name","age"]
-    filename_time_format="yyyy.MM.dd"
-    is_enable_transaction=true
-    save_mode="error"
-}
-
-```
-
-</TabItem>
-
-<TabItem value="LocalFile">
-
-```bash
-
-HdfsFile {
-    path="file:///tmp/hive/warehouse/test2"
-    field_delimiter="\t"
-    row_delimiter="\n"
-    partition_by=["age"]
-    partition_dir_expression="${k0}=${v0}"
-    is_partition_field_write_in_file=true
-    file_name_expression="${transactionId}_${now}"
-    file_format="text"
-    sink_columns=["name","age"]
-    filename_time_format="yyyy.MM.dd"
-    is_enable_transaction=true
-    save_mode="error"
-}
-
-```
-
-</TabItem>
-</Tabs>
diff --git a/docs/en/connector-v2/sink/HdfsFile.md 
b/docs/en/connector-v2/sink/HdfsFile.md
new file mode 100644
index 000000000..36d9f6b35
--- /dev/null
+++ b/docs/en/connector-v2/sink/HdfsFile.md
@@ -0,0 +1,141 @@
+# HdfsFile
+
+## Description
+
+Output data to hdfs file. Support bounded and unbounded job.
+
+## Options
+
+In order to use this connector, You must ensure your spark/flink cluster 
already integrated hadoop. The tested hadoop version is 2.x.
+
+| name                              | type   | required | default value        
                                         |
+| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
+| path                              | string | yes      | -                    
                                         |
+| file_name_expression              | string | no       | "${transactionId}"   
                                         |
+| file_format                       | string | no       | "text"               
                                         |
+| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
+| field_delimiter                   | string | no       | '\001'               
                                         |
+| row_delimiter                     | string | no       | "\n"                 
                                         |
+| partition_by                      | array  | no       | -                    
                                         |
+| partition_dir_expression          | string | no       | 
"\${k0}=\${v0}\/\${k1}=\${v1}\/...\/\${kn}=\${vn}\/"          |
+| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
+| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
+| is_enable_transaction             | boolean| no       | true                 
                                         |
+| save_mode                         | string | no       | "error"              
                                         |
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://`.
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported as the following file types:
+
+`text` `csv` `parquet`
+
+Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description        |
+| ------ | ------------------ |
+| y      | Year               |
+| M      | Month              |
+| d      | Day of month       |
+| H      | Hour in day (0-23) |
+| m      | Minute in hour     |
+| s      | Second in minute   |
+
+See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data. Only needed by `text` and 
`csv` file format.
+
+### row_delimiter [string]
+
+The separator between rows in a file. Only needed by `text` and `csv` file 
format.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
+
+Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be 
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually 
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite`. This means we will delete the 
old file when a new file have a same name with it.
+
+If `is_enable_transaction` is `true`, Basically, we won't encounter the same 
file name. Because we will add the transaction id to file name.
+
+For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+## Example
+
+```bash
+
+HdfsFile {
+    path="hdfs://mycluster/tmp/hive/warehouse/test2"
+    field_delimiter="\t"
+    row_delimiter="\n"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="text"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+}
+
+```
+
+For parquet file format
+
+```bash
+
+HdfsFile {
+    path="hdfs://mycluster/tmp/hive/warehouse/test2"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="parquet"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+}
+
+```
diff --git a/docs/en/connector-v2/sink/LocalFile.md 
b/docs/en/connector-v2/sink/LocalFile.md
new file mode 100644
index 000000000..ffb0efb8c
--- /dev/null
+++ b/docs/en/connector-v2/sink/LocalFile.md
@@ -0,0 +1,139 @@
+# LocalFile
+
+## Description
+
+Output data to local file. Support bounded and unbounded job.
+
+## Options
+
+| name                              | type   | required | default value        
                                         |
+| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
+| path                              | string | yes      | -                    
                                         |
+| file_name_expression              | string | no       | "${transactionId}"   
                                         |
+| file_format                       | string | no       | "text"               
                                         |
+| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
+| field_delimiter                   | string | no       | '\001'               
                                         |
+| row_delimiter                     | string | no       | "\n"                 
                                         |
+| partition_by                      | array  | no       | -                    
                                         |
+| partition_dir_expression          | string | no       | 
"\${k0}=\${v0}\/\${k1}=\${v1}\/...\/\${kn}=\${vn}\/"          |
+| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
+| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
+| is_enable_transaction             | boolean| no       | true                 
                                         |
+| save_mode                         | string | no       | "error"              
                                         |
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://` , and 
the `local file` starts with `file://`,
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported as the following file types:
+
+`text` `csv` `parquet`
+
+Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description        |
+| ------ | ------------------ |
+| y      | Year               |
+| M      | Month              |
+| d      | Day of month       |
+| H      | Hour in day (0-23) |
+| m      | Minute in hour     |
+| s      | Second in minute   |
+
+See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data. Only needed by `text` and 
`csv` file format.
+
+### row_delimiter [string]
+
+The separator between rows in a file. Only needed by `text` and `csv` file 
format.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
+
+Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be 
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually 
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite`. This means we will delete the 
old file when a new file have a same name with it.
+
+If `is_enable_transaction` is `true`, Basically, we won't encounter the same 
file name. Because we will add the transaction id to file name.
+
+For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+## Example
+
+```bash
+
+LocalFile {
+    path="file:///tmp/hive/warehouse/test2"
+    field_delimiter="\t"
+    row_delimiter="\n"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="text"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+}
+
+```
+
+For parquet file format
+
+```bash
+
+LocalFile {
+    path="file:///tmp/hive/warehouse/test2"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="parquet"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+}
+
+```

[incubator-seatunnel] branch dev updated: optimize file sink doc (#2363)

Reply via email to