This is an automated email from the ASF dual-hosted git repository.
fanjia pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new e7cfa2bc7 [Connector-V2][Doc] Add File Sink Connector V2 document
(#2120)
e7cfa2bc7 is described below
commit e7cfa2bc737cd15081807d0f6388f447ff6fbcaf
Author: Eric <[email protected]>
AuthorDate: Tue Jul 5 18:32:11 2022 +0800
[Connector-V2][Doc] Add File Sink Connector V2 document (#2120)
* add file sink connector v2
* Update docs/en/connector-v2/sink/File.mdx
Co-authored-by: Hisoka <[email protected]>
* Update docs/en/connector-v2/sink/File.mdx
Co-authored-by: Hisoka <[email protected]>
---
docs/en/connector-v2/sink/File.mdx | 269 +++++++++++++++++++++++++++++++++++++
1 file changed, 269 insertions(+)
diff --git a/docs/en/connector-v2/sink/File.mdx
b/docs/en/connector-v2/sink/File.mdx
new file mode 100644
index 000000000..71e7631c6
--- /dev/null
+++ b/docs/en/connector-v2/sink/File.mdx
@@ -0,0 +1,269 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# File
+
+## Description
+
+Output data to local or hdfs or s3 file.
+
+:::tip
+
+Used to write data to file. Supports Batch and Streaming mode.
+
+* [x] Batch
+* [x] Streaming
+
+:::
+
+## Options
+
+<Tabs
+ groupId="engine-type"
+ defaultValue="LocalFile"
+ values={[
+ {label: 'LocalFile', value: 'LocalFile'},
+ {label: 'HdfsFile', value: 'HdfsFile'},
+ ]}>
+ <TabItem value="LocalFile">
+
+| name | type | required | default value
|
+| --------------------------------- | ------ | -------- |
------------------------------------------------------------- |
+| path | string | yes | -
|
+| file_name_expression | string | no | "${transactionId}"
|
+| file_format | string | no | "text"
|
+| filename_time_format | string | no | "yyyy.MM.dd"
|
+| field_delimiter | string | no | '\001'
|
+| row_delimiter | string | no | "\n"
|
+| partition_by | array | no | -
|
+| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" |
+| is_partition_field_write_in_file | boolean| no | false
|
+| sink_columns | array | no | When this parameter
is empty, all fields are sink columns |
+| is_enable_transaction | boolean| no | true
|
+| save_mode | string | no | "error"
|
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://` , and
the `local file` starts with `file://`,
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created
into the `path`. We can add the variable `${now}` or `${uuid}` in the
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported `file_format` is `text`.
+
+Please note that, The final file name will ends with the file_format's suffix,
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` ,
`filename_time_format` can specify the time format of the path, and the default
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description |
+| ------ | ------------------ |
+| y | Year |
+| M | Month |
+| d | Day of month |
+| H | Hour in day (0-23) |
+| m | Minute in hour |
+| s | Second in minute |
+
+See [Java
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data.
+
+### row_delimiter [string]
+
+The separator between rows in a file.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding
partition directory based on the partition information, and the final file will
be placed in the partition directory.
+
+Default `partition_dir_expression` is
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite` , `append` , `ignore` and `error`
. For the specific meaning of each mode, see
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+Streaming Job not support `overwrite`.
+
+</TabItem>
+<TabItem value="HdfsFile">
+
+| name | type | required | default value
|
+| --------------------------------- | ------ | -------- |
------------------------------------------------------------- |
+| path | string | yes | -
|
+| file_name_expression | string | no | "${transactionId}"
|
+| file_format | string | no | "text"
|
+| filename_time_format | string | no | "yyyy.MM.dd"
|
+| field_delimiter | string | no | '\001'
|
+| row_delimiter | string | no | "\n"
|
+| partition_by | array | no | -
|
+| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" |
+| is_partition_field_write_in_file | boolean| no | false
|
+| sink_columns | array | no | When this parameter
is empty, all fields are sink columns |
+| is_enable_transaction | boolean| no | true
|
+| save_mode | string | no | "error"
|
+
+### path [string]
+
+The target dir path is required. The `hdfs file` starts with `hdfs://` , and
the `local file` starts with `file://`,
+
+### file_name_expression [string]
+
+`file_name_expression` describes the file expression which will be created
into the `path`. We can add the variable `${now}` or `${uuid}` in the
`file_name_expression`, like `test_${uuid}_${now}`,
+`${now}` represents the current time, and its format can be defined by
specifying the option `filename_time_format`.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
+
+### file_format [string]
+
+We supported `file_format` is `text`.
+
+Please note that, The final file name will ends with the file_format's suffix,
the suffix of the text file is `txt`.
+
+### filename_time_format [string]
+
+When the format in the `file_name_expression` parameter is `xxxx-${now}` ,
`filename_time_format` can specify the time format of the path, and the default
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+
+| Symbol | Description |
+| ------ | ------------------ |
+| y | Year |
+| M | Month |
+| d | Day of month |
+| H | Hour in day (0-23) |
+| m | Minute in hour |
+| s | Second in minute |
+
+See [Java
SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html)
for detailed time format syntax.
+
+### field_delimiter [string]
+
+The separator between columns in a row of data.
+
+### row_delimiter [string]
+
+The separator between rows in a file.
+
+### partition_by [array]
+
+Partition data based on selected fields
+
+### partition_dir_expression [string]
+
+If the `partition_by` is specified, we will generate the corresponding
partition directory based on the partition information, and the final file will
be placed in the partition directory.
+
+Default `partition_dir_expression` is
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field
and `v0` is the value of the first partition field.
+
+### is_partition_field_write_in_file [boolean]
+
+If `is_partition_field_write_in_file` is `true`, the partition field and the
value of it will be write into data file.
+
+For example, if you want to write a Hive Data File, Its value should be
`false`.
+
+### sink_columns [array]
+
+Which columns need be write to file, default value is all of the columns get
from `Transform` or `Source`.
+The order of the fields determines the order in which the file is actually
written.
+
+### is_enable_transaction [boolean]
+
+If `is_enable_transaction` is true, we will ensure that data will not be lost
or duplicated when it is written to the target directory.
+
+Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
+
+Only support `true` now.
+
+### save_mode [string]
+
+Storage mode, currently supports `overwrite` , `append` , `ignore` and `error`
. For the specific meaning of each mode, see
[save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)
+
+Streaming Job not support `overwrite`.
+</TabItem>
+</Tabs>
+
+## Example
+
+<Tabs
+ groupId="engine-type"
+ defaultValue="LocalFile"
+ values={[
+ {label: 'LocalFile', value: 'LocalFile'},
+ {label: 'HdfsFile', value: 'HdfsFile'},
+ ]}>
+<TabItem value="LocalFile">
+
+```bash
+
+LocalFile {
+ path="file:///tmp/hive/warehouse/test2"
+ field_delimiter="\t"
+ row_delimiter="\n"
+ partition_by=["age"]
+ partition_dir_expression="${k0}=${v0}"
+ is_partition_field_write_in_file=true
+ file_name_expression="${transactionId}_${now}"
+ file_format="text"
+ sink_columns=["name","age"]
+ filename_time_format="yyyy.MM.dd"
+ is_enable_transaction=true
+ save_mode="error"
+}
+
+```
+
+</TabItem>
+
+<TabItem value="LocalFile">
+
+```bash
+
+HdfsFile {
+ path="file:///tmp/hive/warehouse/test2"
+ field_delimiter="\t"
+ row_delimiter="\n"
+ partition_by=["age"]
+ partition_dir_expression="${k0}=${v0}"
+ is_partition_field_write_in_file=true
+ file_name_expression="${transactionId}_${now}"
+ file_format="text"
+ sink_columns=["name","age"]
+ filename_time_format="yyyy.MM.dd"
+ is_enable_transaction=true
+ save_mode="error"
+}
+
+```
+
+</TabItem>
+</Tabs>