[incubator-seatunnel] branch dev updated: [Connector-V2][Doc] Add File Sink Connector V2 document (#2120)

fanjia Tue, 05 Jul 2022 03:32:23 -0700

This is an automated email from the ASF dual-hosted git repository.

fanjia pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new e7cfa2bc7 [Connector-V2][Doc] Add File Sink Connector V2 document 
(#2120)
e7cfa2bc7 is described below

commit e7cfa2bc737cd15081807d0f6388f447ff6fbcaf
Author: Eric <[email protected]>
AuthorDate: Tue Jul 5 18:32:11 2022 +0800

    [Connector-V2][Doc] Add File Sink Connector V2 document (#2120)
    
    * add file sink connector v2
    
    * Update docs/en/connector-v2/sink/File.mdx
    
    Co-authored-by: Hisoka <[email protected]>
    
    * Update docs/en/connector-v2/sink/File.mdx
    
    Co-authored-by: Hisoka <[email protected]>
---
 docs/en/connector-v2/sink/File.mdx | 269 +++++++++++++++++++++++++++++++++++++
 1 file changed, 269 insertions(+)

diff --git a/docs/en/connector-v2/sink/File.mdx 
b/docs/en/connector-v2/sink/File.mdx
new file mode 100644
index 000000000..71e7631c6
--- /dev/null
+++ b/docs/en/connector-v2/sink/File.mdx
@@ -0,0 +1,269 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# File
+
+## Description
+
+Output data to local or hdfs or s3 file.
+
+:::tip
+
+Used to write data to file. Supports Batch and Streaming mode.
+
+* [x] Batch
+* [x] Streaming
+
+:::
+
+## Options
+
+<Tabs
+    groupId="engine-type"
+    defaultValue="LocalFile"
+    values={[
+        {label: 'LocalFile', value: 'LocalFile'},
+        {label: 'HdfsFile', value: 'HdfsFile'},
+    ]}>
+    <TabItem value="LocalFile">
+
+| name                              | type   | required | default value        
                                         |
+| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
+| path                              | string | yes      | -                    
                                         |
+| file_name_expression              | string | no       | "${transactionId}"   
                                         |
+| file_format                       | string | no       | "text"               
                                         |
+| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
+| field_delimiter                   | string | no       | '\001'               
                                         |
+| row_delimiter                     | string | no       | "\n"                 
                                         |
+| partition_by                      | array  | no       | -                    
                                         |
+| partition_dir_expression          | string | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"                    |
+| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
+| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
+| is_enable_transaction             | boolean| no       | true                 
                                         |
+| save_mode                         | string | no       | "error"              
                                         |
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://` , and 
the `local file` starts with `file://`,
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported `file_format` is `text`.
+
+Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description        |
+| ------ | ------------------ |
+| y      | Year               |
+| M      | Month              |
+| d      | Day of month       |
+| H      | Hour in day (0-23) |
+| m      | Minute in hour     |
+| s      | Second in minute   |
+
+See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data.
+
+### row_delimiter [string]
+
+The separator between rows in a file.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
+
+Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be 
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually 
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite` , `append` , `ignore` and `error` 
. For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+Streaming Job not support `overwrite`.
+
+</TabItem>
+<TabItem value="HdfsFile">
+
+| name                              | type   | required | default value        
                                         |
+| --------------------------------- | ------ | -------- | 
------------------------------------------------------------- |
+| path                              | string | yes      | -                    
                                         |
+| file_name_expression              | string | no       | "${transactionId}"   
                                         |
+| file_format                       | string | no       | "text"               
                                         |
+| filename_time_format              | string | no       | "yyyy.MM.dd"         
                                         |
+| field_delimiter                   | string | no       | '\001'               
                                         |
+| row_delimiter                     | string | no       | "\n"                 
                                         |
+| partition_by                      | array  | no       | -                    
                                         |
+| partition_dir_expression          | string | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"                    |
+| is_partition_field_write_in_file  | boolean| no       | false                
                                         |
+| sink_columns                      | array  | no       | When this parameter 
is empty, all fields are sink columns     |
+| is_enable_transaction             | boolean| no       | true                 
                                         |
+| save_mode                         | string | no       | "error"              
                                         |
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://` , and 
the `local file` starts with `file://`,
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported `file_format` is `text`.
+
+Please note that, The final file name will ends with the file_format's suffix, 
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description        |
+| ------ | ------------------ |
+| y      | Year               |
+| M      | Month              |
+| d      | Day of month       |
+| H      | Hour in day (0-23) |
+| m      | Minute in hour     |
+| s      | Second in minute   |
+
+See [Java 
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
 for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data.
+
+### row_delimiter [string]
+
+The separator between rows in a file.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
+
+Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be 
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get 
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually 
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite` , `append` , `ignore` and `error` 
. For the specific meaning of each mode, see 
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+Streaming Job not support `overwrite`.
+</TabItem>
+</Tabs>
+
+## Example
+
+<Tabs
+    groupId="engine-type"
+    defaultValue="LocalFile"
+    values={[
+        {label: 'LocalFile', value: 'LocalFile'},
+        {label: 'HdfsFile', value: 'HdfsFile'},
+    ]}>
+<TabItem value="LocalFile">
+
+```bash
+
+LocalFile {
+    path="file:///tmp/hive/warehouse/test2"
+    field_delimiter="\t"
+    row_delimiter="\n"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="text"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+    save_mode="error"
+}
+
+```
+
+</TabItem>
+
+<TabItem value="LocalFile">
+
+```bash
+
+HdfsFile {
+    path="file:///tmp/hive/warehouse/test2"
+    field_delimiter="\t"
+    row_delimiter="\n"
+    partition_by=["age"]
+    partition_dir_expression="${k0}=${v0}"
+    is_partition_field_write_in_file=true
+    file_name_expression="${transactionId}_${now}"
+    file_format="text"
+    sink_columns=["name","age"]
+    filename_time_format="yyyy.MM.dd"
+    is_enable_transaction=true
+    save_mode="error"
+}
+
+```
+
+</TabItem>
+</Tabs>

[incubator-seatunnel] branch dev updated: [Connector-V2][Doc] Add File Sink Connector V2 document (#2120)

Reply via email to