This is an automated email from the ASF dual-hosted git repository.
wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new f4ea676a20 [Docs][Connector-V2][Oss]Reconstruct the OssFile connector
document (#5233)
f4ea676a20 is described below
commit f4ea676a20c6e20d5bc3cbd9e0335e4d9779d8fc
Author: Jia Fan <[email protected]>
AuthorDate: Tue Dec 5 11:14:25 2023 +0800
[Docs][Connector-V2][Oss]Reconstruct the OssFile connector document (#5233)
---
docs/en/connector-v2/sink/OssFile.md | 285 +++++++++++++--------------------
docs/en/connector-v2/source/OssFile.md | 225 ++++++++++----------------
2 files changed, 200 insertions(+), 310 deletions(-)
diff --git a/docs/en/connector-v2/sink/OssFile.md
b/docs/en/connector-v2/sink/OssFile.md
index c723d4a836..3604748477 100644
--- a/docs/en/connector-v2/sink/OssFile.md
+++ b/docs/en/connector-v2/sink/OssFile.md
@@ -2,20 +2,11 @@
> Oss file sink connector
-## Description
-
-Output data to oss file system.
-
-:::tip
-
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
-It only supports hadoop version **2.9.X+**.
+## Support Those Engines
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -31,72 +22,67 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] json
- [x] excel
-## Options
-
-| name | type | required |
default value |
remarks |
-|----------------------------------|---------|----------|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
-| path | string | yes | -
|
|
-| tmp_path | string | no | /tmp/seatunnel
| The result file will write to a tmp path first and then
use `mv` to submit tmp dir to target dir. Need a OSS dir. |
-| bucket | string | yes | -
|
|
-| access_key | string | yes | -
|
|
-| access_secret | string | yes | -
|
|
-| endpoint | string | yes | -
|
|
-| custom_filename | boolean | no | false
| Whether you need custom the filename
|
-| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
-| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
-| file_format_type | string | no | "csv"
|
|
-| field_delimiter | string | no | '\001'
| Only used when file_format_type is text
|
-| row_delimiter | string | no | "\n"
| Only used when file_format_type is text
|
-| have_partition | boolean | no | false
| Whether you need processing partitions.
|
-| partition_by | array | no | -
| Only used then have_partition is true
|
-| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is
true
|
-| is_partition_field_write_in_file | boolean | no | false
| Only used then have_partition is true
|
-| sink_columns | array | no |
| When this parameter is empty, all fields are sink
columns |
-| is_enable_transaction | boolean | no | true
|
|
-| batch_size | int | no | 1000000
|
|
-| compress_codec | string | no | none
|
|
-| common-options | object | no | -
|
|
-| max_rows_in_memory | int | no | -
| Only used when file_format_type is excel.
|
-| sheet_name | string | no | Sheet${Random
number} | Only used when file_format_type is excel.
|
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of oss file system, for example:
`oss://tyrantlucifer-image-bed`
+## Description
-### access_key [string]
+Output data to oss file system.
-The access key of oss file system.
+## Supported DataSource Info
-### access_secret [string]
+In order to use the OssFile connector, the following dependencies are required.
+They can be downloaded via install-plugin.sh or from the Maven central
repository.
-The access secret of oss file system.
+| Datasource | Supported Versions |
Dependency |
+|------------|--------------------|----------------------------------------------------------------------------------------|
+| OssFile | universal |
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/connector-file-oss)
|
-### endpoint [string]
+:::tip
-The endpoint of oss file system.
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-### custom_filename [boolean]
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-Whether custom the filename
+We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
+It only supports hadoop version **2.9.X+**.
-### file_name_expression [string]
+:::
-Only used when `custom_filename` is `true`
+## Data Type Mapping
-`file_name_expression` describes the file expression which will be created
into the `path`. We can add the variable `${now}` or `${uuid}` in the
`file_name_expression`, like `test_${uuid}_${now}`,
-`${now}` represents the current time, and its format can be defined by
specifying the option `filename_time_format`.
+SeaTunnel will write the data into the file in String format according to the
SeaTunnel data type and file_format_type.
-Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
+## Options
-### filename_time_format [string]
+| Name | Type | Required |
Default value |
Description
[...]
+|----------------------------------|---------|----------|--------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| path | String | Yes | -
| The target dir path is required.
[...]
+| tmp_path | string | no | /tmp/seatunnel
| The result file will write to a tmp path first and then
use `mv` to submit tmp dir to target dir. Need a OSS dir.
[...]
+| bucket | String | Yes | -
| The bucket address of oss file system, for example:
`oss://tyrantlucifer-image-bed`
[...]
+| access_key | String | No | -
| The access key of oss file system.
[...]
+| access_secret | String | No | -
| The access secret of oss file system.
[...]
+| endpoint | String | Yes | -
| The endpoint of oss file system.
[...]
+| custom_filename | Boolean | No | false
| Whether you need custom the filename
[...]
+| file_name_expression | String | No | "${transactionId}"
| Only used when `custom_filename` is `true`. <br/>
`file_name_expression` describes the file expression which will be created into
the `path`. We can add the variable `${Now}` or `${uuid}` in the
`file_name_expression`, like `test_${uuid}_${Now}`, `${Now}` represents the
current time, and its format can be defined by specifying the option
`filename_time_format`. <br/>Please Note that, If [...]
+| filename_time_format | String | No | "yyyy.MM.dd"
| Please check #filename_time_format below
[...]
+| file_format_type | String | No | "csv"
| We supported as the following file types: <br/> `text`
`json` `csv` `orc` `parquet` `excel` <br/> Please Note that, The final file
name will end with the file_format's suffix, the suffix of the text file is
`txt`.
[...]
+| field_delimiter | String | No | '\001'
| The separator between columns in a row of data. Only
needed by `text` file format.
[...]
+| row_delimiter | String | No | "\n"
| The separator between rows in a file. Only needed by
`text` file format.
[...]
+| have_partition | Boolean | No | false
| Whether you need processing partitions.
[...]
+| partition_by | Array | No | -
| Only used when `have_partition` is `true`. <br/>
Partition data based on selected fields.
[...]
+| partition_dir_expression | String | No |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used when `have_partition` is
`true`. <br/> If the `partition_by` is specified, we will generate the
corresponding partition directory based on the partition information, and the
final file will be placed in the partition directory. <br/> Default
`partition_dir_expression` is `${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0`
is the first partition field and `v0` is the value of the [...]
+| is_partition_field_write_in_file | Boolean | No | false
| Only used when `have_partition` is `true`. <br/> If
`is_partition_field_write_in_file` is `true`, the partition field and the value
of it will be write into data file. <br/> For example, if you want to write a
Hive Data File, Its value should be `false`.
[...]
+| sink_columns | Array | No |
| Which columns need be written to file, default value is
all the columns get from `Transform` or `Source`. <br/> The order of the fields
determines the order in which the file is actually written.
[...]
+| is_enable_transaction | Boolean | No | true
| If `is_enable_transaction` is true, we will ensure that
data will Not be lost or duplicated when it is written to the target directory.
<br/> Please Note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file. <br/> Only support `true` Now.
[...]
+| batch_size | Int | No | 1000000
| The maximum number of rows in a file. For SeaTunnel
Engine, the number of lines in the file is determined by `batch_size` and
`checkpoint.interval` jointly decide. If the value of `checkpoint.interval` is
large eNough, sink writer will write rows in a file until the rows in the file
larger than `batch_size`. If `checkpoint.interval` is small, the sink writer
will create a new file when [...]
+| compress_codec | String | No | None
| The compress codec of files and the details that
supported as the following shown: <br/> - txt: `lzo` `None` <br/> - json: `lzo`
`None` <br/> - csv: `lzo` `None` <br/> - orc: `lzo` `snappy` `lz4` `zlib`
`None` <br/> - parquet: `lzo` `snappy` `lz4` `gzip` `brotli` `zstd` `None`
<br/> Tips: excel type does Not support any compression format
[...]
+| max_rows_in_memory | Int | No | -
| When File Format is Excel,The maximum number of data
items that can be cached in the memory.
[...]
+| sheet_name | String | No | Sheet${Random
number} | Writer the sheet of the workbook
[...]
+| common-options | Config | No | -
| Sink plugin common parameters, please refer to [Sink
Common Options](common-options.md) for details.
[...]
+
+### filename_time_format [String]
Only used when `custom_filename` is `true`
-When the format in the `file_name_expression` parameter is `xxxx-${now}` ,
`filename_time_format` can specify the time format of the path, and the default
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+When the format in the `file_name_expression` parameter is `xxxx-${Now}` ,
`filename_time_format` can specify the time format of the path, and the default
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
| Symbol | Description |
|--------|--------------------|
@@ -107,95 +93,33 @@ When the format in the `file_name_expression` parameter is
`xxxx-${now}` , `file
| m | Minute in hour |
| s | Second in minute |
-### file_format_type [string]
-
-We supported as the following file types:
-
-`text` `json` `csv` `orc` `parquet` `excel`
-
-Please note that, The final file name will end with the file_format_type's
suffix, the suffix of the text file is `txt`.
-
-### field_delimiter [string]
-
-The separator between columns in a row of data. Only needed by `text` file
format.
-
-### row_delimiter [string]
-
-The separator between rows in a file. Only needed by `text` file format.
-
-### have_partition [boolean]
-
-Whether you need processing partitions.
-
-### partition_by [array]
-
-Only used when `have_partition` is `true`.
+## How to Create a Oss Data Synchronization Jobs
-Partition data based on selected fields.
-
-### partition_dir_expression [string]
-
-Only used when `have_partition` is `true`.
-
-If the `partition_by` is specified, we will generate the corresponding
partition directory based on the partition information, and the final file will
be placed in the partition directory.
-
-Default `partition_dir_expression` is
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field
and `v0` is the value of the first partition field.
-
-### is_partition_field_write_in_file [boolean]
-
-Only used when `have_partition` is `true`.
-
-If `is_partition_field_write_in_file` is `true`, the partition field and the
value of it will be write into data file.
-
-For example, if you want to write a Hive Data File, Its value should be
`false`.
-
-### sink_columns [array]
-
-Which columns need be written to file, default value is all the columns get
from `Transform` or `Source`.
-The order of the fields determines the order in which the file is actually
written.
-
-### is_enable_transaction [boolean]
-
-If `is_enable_transaction` is true, we will ensure that data will not be lost
or duplicated when it is written to the target directory.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add
`${transactionId}_` in the head of the file.
-
-Only support `true` now.
-
-### batch_size [int]
-
-The maximum number of rows in a file. For SeaTunnel Engine, the number of
lines in the file is determined by `batch_size` and `checkpoint.interval`
jointly decide. If the value of `checkpoint.interval` is large enough, sink
writer will write rows in a file until the rows in the file larger than
`batch_size`. If `checkpoint.interval` is small, the sink writer will create a
new file when a new checkpoint trigger.
-
-### compress_codec [string]
-
-The compress codec of files and the details that supported as the following
shown:
-
-- txt: `lzo` `none`
-- json: `lzo` `none`
-- csv: `lzo` `none`
-- orc: `lzo` `snappy` `lz4` `zlib` `none`
-- parquet: `lzo` `snappy` `lz4` `gzip` `brotli` `zstd` `none`
-
-Tips: excel type does not support any compression format
-
-### common options
-
-Sink plugin common parameters, please refer to [Sink Common
Options](common-options.md) for details.
-
-### max_rows_in_memory [int]
-
-When File Format is Excel,The maximum number of data items that can be cached
in the memory.
-
-### sheet_name [string]
-
-Writer the sheet of the workbook
-
-## Example
+The following example demonstrates how to create a data synchronization job
that reads data from Fake Source and writes it to the Oss:
For text file format with `have_partition` and `custom_filename` and
`sink_columns`
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+ FakeSource {
+ schema = {
+ fields {
+ name = string
+ age = int
+ }
+ }
+ }
+}
+# write data to Oss
+sink {
OssFile {
path="/seatunnel/sink"
bucket = "oss://tyrantlucifer-image-bed"
@@ -215,13 +139,32 @@ For text file format with `have_partition` and
`custom_filename` and `sink_colum
sink_columns = ["name","age"]
is_enable_transaction = true
}
-
+}
```
For parquet file format with `have_partition` and `sink_columns`
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+ FakeSource {
+ schema = {
+ fields {
+ name = string
+ age = int
+ }
+ }
+ }
+}
+# Write data to Oss
+sink {
OssFile {
path = "/seatunnel/sink"
bucket = "oss://tyrantlucifer-image-bed"
@@ -235,13 +178,32 @@ For parquet file format with `have_partition` and
`sink_columns`
file_format_type = "parquet"
sink_columns = ["name","age"]
}
-
+}
```
For orc file format simple config
```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+ FakeSource {
+ schema = {
+ fields {
+ name = string
+ age = int
+ }
+ }
+ }
+}
+# Write data to Oss
+sink {
OssFile {
path="/seatunnel/sink"
bucket = "oss://tyrantlucifer-image-bed"
@@ -250,27 +212,10 @@ For orc file format simple config
endpoint = "oss-cn-beijing.aliyuncs.com"
file_format_type = "orc"
}
-
+}
```
-## Changelog
-
-### 2.2.0-beta 2022-09-26
-
-- Add OSS Sink Connector
-
-### 2.3.0-beta 2022-10-20
-
-- [BugFix] Fix the bug of incorrect path in windows environment
([2980](https://github.com/apache/seatunnel/pull/2980))
-- [BugFix] Fix filesystem get error
([3117](https://github.com/apache/seatunnel/pull/3117))
-- [BugFix] Solved the bug of can not parse '\t' as delimiter from config file
([3083](https://github.com/apache/seatunnel/pull/3083))
-
-### Next version
+### Tips
-- [BugFix] Fixed the following bugs that failed to write data to files
([3258](https://github.com/apache/seatunnel/pull/3258))
- - When field from upstream is null it will throw NullPointerException
- - Sink columns mapping failed
- - When restore writer from states getting transaction directly failed
-- [Improve] Support setting batch size for every file
([3625](https://github.com/apache/seatunnel/pull/3625))
-- [Improve] Support file compress
([3899](https://github.com/apache/seatunnel/pull/3899))
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).
diff --git a/docs/en/connector-v2/source/OssFile.md
b/docs/en/connector-v2/source/OssFile.md
index 2f51024b67..87e7e0180f 100644
--- a/docs/en/connector-v2/source/OssFile.md
+++ b/docs/en/connector-v2/source/OssFile.md
@@ -31,6 +31,15 @@ Read all the data in a split in a pollNext call. What splits
are read will be sa
Read data from aliyun oss file system.
+## Supported DataSource Info
+
+In order to use the OssFile connector, the following dependencies are required.
+They can be downloaded via install-plugin.sh or from the Maven central
repository.
+
+| Datasource | Supported Versions |
Dependency |
+|------------|--------------------|----------------------------------------------------------------------------------------|
+| OssFile | universal |
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/connector-file-oss)
|
+
:::tip
If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
@@ -42,32 +51,50 @@ It only supports hadoop version **2.9.X+**.
:::
-## Options
-
-| name | type | required | default value |
-|---------------------------|---------|----------|---------------------|
-| path | string | yes | - |
-| file_format_type | string | yes | - |
-| bucket | string | yes | - |
-| access_key | string | yes | - |
-| access_secret | string | yes | - |
-| endpoint | string | yes | - |
-| read_columns | list | yes | - |
-| delimiter/field_delimiter | string | no | \001 |
-| parse_partition_from_path | boolean | no | true |
-| skip_header_row_number | long | no | 0 |
-| date_format | string | no | yyyy-MM-dd |
-| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
-| time_format | string | no | HH:mm:ss |
-| schema | config | no | - |
-| sheet_name | string | no | - |
-| file_filter_pattern | string | no | - |
-| compress_codec | string | no | none |
-| common-options | | no | - |
-
-### path [string]
-
-The source file path.
+## Data Type Mapping
+
+The File does not have a specific type list, and we can indicate which
SeaTunenl data type the corresponding data needs to be converted to by
specifying the Schema in the config.
+
+| SeaTunnel Data type |
+|---------------------|
+| STRING |
+| SHORT |
+| INT |
+| BIGINT |
+| BOOLEAN |
+| DOUBLE |
+| DECIMAL |
+| FLOAT |
+| DATE |
+| TIME |
+| TIMESTAMP |
+| BYTES |
+| ARRAY |
+| MAP |
+
+## Source Options
+
+| Name | Type | Required | default value |
Description
|
+|---------------------------|---------|----------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path | String | Yes | - | The
source file path.
|
+| file_format_type | String | Yes | - |
Please check #file_format_type below
|
+| bucket | String | Yes | - | The
bucket address of oss file system, for example: `oss://tyrantlucifer-image-bed`
|
+| endpoint | String | Yes | - | The
endpoint of oss file system.
|
+| read_columns | List | No | - | The
read column list of the data source, user can use it to implement field
projection. <br/> The file type supported column projection as the following
shown: <br/> - text <br/> - json <br/> - csv <br/> - orc <br/> - parquet <br/>
- excel <br/> **Tips: If the user wants to use this feature when reading `text`
`json` `csv` files, the schema option must be configured** |
+| access_key | String | No | - | The
access key of oss file system.
|
+| access_secret | String | No | - | The
access secret of oss file system.
|
+| file_filter_pattern | String | No | - |
Filter pattern, which used for filtering files.
|
+| delimiter/field_delimiter | String | No | \001 |
**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead. <br/> Field delimiter, used to tell connector how
to slice and dice fields when reading text files. <br/> Default `\001`, the
same as hive's default delimiter
|
+| parse_partition_from_path | Boolean | No | true |
Control whether parse the partition keys and values from file path <br/> For
example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26` <br/>
Every record data from file will be added these two fields: <br/> name
age <br/> tyrantlucifer 26 <br/> Tips: **Do not define partition fields
in schema option** |
+| date_format | String | No | yyyy-MM-dd | Date
type format, used to tell connector how to convert string to date, supported as
the following formats: <br/> `yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` <br/>
default `yyyy-MM-dd`
|
+| datetime_format | String | No | yyyy-MM-dd HH:mm:ss |
Datetime type format, used to tell connector how to convert string to datetime,
supported as the following formats: <br/> `yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd
HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` <br/> default `yyyy-MM-dd
HH:mm:ss`
|
+| time_format | String | No | HH:mm:ss | Time
type format, used to tell connector how to convert string to time, supported as
the following formats: <br/> `HH:mm:ss` `HH:mm:ss.SSS` <br/> default `HH:mm:ss`
|
+| skip_header_row_number | Long | No | 0 | Skip
the first few lines, but only for the txt and csv. <br/> For example, set like
following: <br/> `skip_header_row_number = 2` <br/> then SeaTunnel will skip
the first 2 lines from source files
|
+| sheet_name | String | No | - |
Reader the sheet of the workbook,Only used when file_format is excel.
|
+| schema | Config | No | - |
Please check #schema below
|
+| file_filter_pattern | string | no | - |
Filter pattern, which used for filtering files.
|
+| compress_codec | string | no | none | The
compress codec of files and the details that supported as the following shown:
<br/> - txt: `lzo` `none` <br/> - json: `lzo` `none` <br/> - csv: `lzo` `none`
<br/> - orc/parquet: automatically recognizes the compression type, no
additional settings required.
|
+| common-options | | No | - |
Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
|
### file_format_type [string]
@@ -157,84 +184,6 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |
-### bucket [string]
-
-The bucket address of oss file system, for example:
`oss://tyrantlucifer-image-bed`
-
-### access_key [string]
-
-The access key of oss file system.
-
-### access_secret [string]
-
-The access secret of oss file system.
-
-### endpoint [string]
-
-The endpoint of oss file system.
-
-### read_columns [list]
-
-The read column list of the data source, user can use it to implement field
projection.
-
-### delimiter/field_delimiter [string]
-
-**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
-
-Only need to be configured when file_format is text.
-
-Field delimiter, used to tell connector how to slice and dice fields.
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
### schema [config]
Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
@@ -243,34 +192,19 @@ Only need to be configured when the file_format_type are
text, json, excel or cs
The schema of upstream data.
-### sheet_name [string]
-
-Only need to be configured when file_format is excel.
-
-Reader the sheet of the workbook.
-
-### file_filter_pattern [string]
-
-Filter pattern, which used for filtering files.
-
-### compress_codec [string]
-
-The compress codec of files and the details that supported as the following
shown:
-
-- txt: `lzo` `none`
-- json: `lzo` `none`
-- csv: `lzo` `none`
-- orc/parquet:
- automatically recognizes the compression type, no additional settings
required.
+## How to Create a Oss Data Synchronization Jobs
-### common options
+The following example demonstrates how to create a data synchronization job
that reads data from Oss and prints it on the local client:
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
-
-## Example
-
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
+# Create a source to connect to Oss
+source {
OssFile {
path = "/seatunnel/orc"
bucket = "oss://tyrantlucifer-image-bed"
@@ -279,11 +213,24 @@ Source plugin common parameters, please refer to [Source
Common Options](common-
endpoint = "oss-cn-beijing.aliyuncs.com"
file_format_type = "orc"
}
+}
+# Console printing of the read Oss data
+sink {
+ Console {
+ }
+}
```
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+ execution.parallelism = 1
+ job.mode = "BATCH"
+}
+# Create a source to connect to Oss
+source {
OssFile {
path = "/seatunnel/json"
bucket = "oss://tyrantlucifer-image-bed"
@@ -298,18 +245,16 @@ Source plugin common parameters, please refer to [Source
Common Options](common-
}
}
}
+}
+# Console printing of the read Oss data
+sink {
+ Console {
+ }
+}
```
-## Changelog
-
-### 2.2.0-beta 2022-09-26
-
-- Add OSS File Source Connector
-
-### 2.3.0-beta 2022-10-20
+### Tips
-- [BugFix] Fix the bug of incorrect path in windows environment
([2980](https://github.com/apache/seatunnel/pull/2980))
-- [Improve] Support extract partition from SeaTunnelRow fields
([3085](https://github.com/apache/seatunnel/pull/3085))
-- [Improve] Support parse field from file path
([2985](https://github.com/apache/seatunnel/pull/2985))
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).