This is an automated email from the ASF dual-hosted git repository.
gaojun2048 pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new 8d87cf8fc4 [Improve][connector-file] unifiy option between file
source/sink and update document (#5680)
8d87cf8fc4 is described below
commit 8d87cf8fc488b2a5d1861c3e8e75d8581e7b8701
Author: Jarvis <[email protected]>
AuthorDate: Mon Oct 30 23:15:37 2023 +0800
[Improve][connector-file] unifiy option between file source/sink and update
document (#5680)
---
docs/en/connector-v2/source/CosFile.md | 175 +++++++++----------
docs/en/connector-v2/source/FtpFile.md | 181 ++++++++++----------
docs/en/connector-v2/source/HdfsFile.md | 8 +-
docs/en/connector-v2/source/LocalFile.md | 171 ++++++++++---------
docs/en/connector-v2/source/OssFile.md | 185 ++++++++++----------
docs/en/connector-v2/source/OssJindoFile.md | 183 ++++++++++----------
docs/en/connector-v2/source/S3File.md | 12 +-
docs/en/connector-v2/source/SftpFile.md | 187 +++++++++++----------
.../seatunnel/file/config/BaseSourceConfig.java | 5 +-
.../file/source/reader/ExcelReadStrategy.java | 5 +-
.../file/source/reader/TextReadStrategy.java | 11 +-
.../file/cos/source/CosFileSourceFactory.java | 2 +-
.../file/ftp/source/FtpFileSourceFactory.java | 2 +-
.../file/hdfs/source/HdfsFileSourceFactory.java | 2 +-
.../file/oss/source/OssFileSourceFactory.java | 2 +-
.../file/local/source/LocalFileSourceFactory.java | 2 +-
.../file/oss/source/OssFileSourceFactory.java | 2 +-
.../file/s3/source/S3FileSourceFactory.java | 2 +-
.../seatunnel/file/sftp/config/SftpConf.java | 4 +-
.../seatunnel/file/sftp/config/SftpConfig.java | 2 +-
.../seatunnel/file/sftp/sink/SftpFileSink.java | 2 +-
.../file/sftp/sink/SftpFileSinkFactory.java | 2 +-
.../seatunnel/file/sftp/source/SftpFileSource.java | 2 +-
.../file/sftp/source/SftpFileSourceFactory.java | 4 +-
.../test/resources/excel/cos_excel_to_assert.conf | 2 +-
.../excel/ftp_excel_projection_to_assert.conf | 2 +-
.../test/resources/excel/ftp_excel_to_assert.conf | 2 +-
.../excel/ftp_filter_excel_to_assert.conf | 2 +-
.../e2e/connector/file/local/LocalFileIT.java | 6 +
.../excel/local_excel_projection_to_assert.conf | 2 +-
.../resources/excel/local_excel_to_assert.conf | 2 +-
.../excel/local_filter_excel_to_assert.conf | 2 +-
.../src/test/resources/text/e2e_delimiter.txt | 5 +
.../local_file_delimiter_assert.conf} | 34 ++--
.../text/local_file_text_lzo_to_assert.conf | 1 -
.../excel/sftp_excel_projection_to_assert.conf | 2 +-
.../test/resources/excel/sftp_excel_to_assert.conf | 2 +-
.../excel/sftp_filter_excel_to_assert.conf | 2 +-
38 files changed, 632 insertions(+), 587 deletions(-)
diff --git a/docs/en/connector-v2/source/CosFile.md
b/docs/en/connector-v2/source/CosFile.md
index 236c4b8ca0..406c86fab5 100644
--- a/docs/en/connector-v2/source/CosFile.md
+++ b/docs/en/connector-v2/source/CosFile.md
@@ -2,19 +2,11 @@
> Cos file source connector
-## Description
-
-Read data from aliyun Cos file system.
-
-:::tip
+## Support Those Engines
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar
and cos_api-bundle-{version}.jar in ${SEATUNNEL_HOME}/lib dir, download:
[Hadoop-Cos-release](https://github.com/tencentyun/hadoop-cos/releases). It
only supports hadoop version 2.6.5+ and version 8.0.2+.
-
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -35,6 +27,20 @@ Read all the data in a split in a pollNext call. What splits
are read will be sa
- [x] json
- [x] excel
+## Description
+
+Read data from aliyun Cos file system.
+
+:::tip
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar
and cos_api-bundle-{version}.jar in ${SEATUNNEL_HOME}/lib dir, download:
[Hadoop-Cos-release](https://github.com/tencentyun/hadoop-cos/releases). It
only supports hadoop version 2.6.5+ and version 8.0.2+.
+
+:::
+
## Options
| name | type | required | default value |
@@ -46,76 +52,22 @@ Read all the data in a split in a pollNext call. What
splits are read will be sa
| secret_key | string | yes | - |
| region | string | yes | - |
| read_columns | list | yes | - |
-| delimiter | string | no | \001 |
+| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| skip_header_row_number | long | no | 0 |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### path [string]
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`cosn://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
### file_format_type [string]
File type, supported as the following file types:
@@ -181,13 +133,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -220,34 +172,81 @@ The secret key of Cos file system.
The region of cos file system.
-### schema [config]
+### read_columns [list]
-#### fields [Config]
+The read column list of the data source, user can use it to implement field
projection.
-The schema of upstream data.
+### delimiter/field_delimiter [string]
-### read_columns [list]
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
-The read column list of the data source, user can use it to implement field
projection.
+Only need to be configured when file_format is text.
-The file type supported column projection as the following shown:
+Field delimiter, used to tell connector how to slice and dice fields
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
+default `\001`, the same as hive's default delimiter
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
+### parse_partition_from_path [boolean]
-### common options
+Control whether parse the partition keys and values from file path
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+For example if you read a file from path
`cosn://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+#### fields [Config]
+
+The schema of upstream data.
### sheet_name [string]
-Reader the sheet of the workbook,Only used when file_format is excel.
+Only need to be configured when file_format is excel.
+
+Reader the sheet of the workbook.
### file_filter_pattern [string]
@@ -263,6 +262,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+
## Example
```hocon
diff --git a/docs/en/connector-v2/source/FtpFile.md
b/docs/en/connector-v2/source/FtpFile.md
index b1d3fb45ee..781d7d40bc 100644
--- a/docs/en/connector-v2/source/FtpFile.md
+++ b/docs/en/connector-v2/source/FtpFile.md
@@ -2,17 +2,11 @@
> Ftp file source connector
-## Description
-
-Read data from ftp file server.
-
-:::tip
+## Support Those Engines
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -28,6 +22,18 @@ If you use SeaTunnel Engine, It automatically integrated the
hadoop jar when you
- [x] json
- [x] excel
+## Description
+
+Read data from ftp file server.
+
+:::tip
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+:::
+
## Options
| name | type | required | default value |
@@ -38,18 +44,18 @@ If you use SeaTunnel Engine, It automatically integrated
the hadoop jar when you
| password | string | yes | - |
| path | string | yes | - |
| file_format_type | string | yes | - |
+| delimiter/field_delimiter | string | no | \001 |
| read_columns | list | no | - |
-| delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| skip_header_row_number | long | no | 0 |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### host [string]
@@ -71,79 +77,6 @@ The target ftp password is required
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`ftp://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
-### schema [config]
-
-The schema information of upstream data.
-
-### read_columns [list]
-
-The read column list of the data source, user can use it to implement field
projection.
-
-The file type supported column projection as the following shown:
-
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
-
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
-
### file_format_type [string]
File type, supported as the following file types:
@@ -198,13 +131,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -221,9 +154,73 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |
-### common options
+### delimiter/field_delimiter [string]
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
+
+Only need to be configured when file_format is text.
+
+Field delimiter, used to tell connector how to slice and dice fields.
+
+default `\001`, the same as hive's default delimiter
+
+### parse_partition_from_path [boolean]
+
+Control whether parse the partition keys and values from file path
+
+For example if you read a file from path
`ftp://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+The schema information of upstream data.
+
+### read_columns [list]
+
+The read column list of the data source, user can use it to implement field
projection.
### sheet_name [string]
@@ -239,6 +236,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+
## Example
```hocon
@@ -254,7 +255,7 @@ The compress codec of files and the details that supported
as the following show
name = string
age = int
}
- delimiter = "#"
+ field_delimiter = "#"
}
```
diff --git a/docs/en/connector-v2/source/HdfsFile.md
b/docs/en/connector-v2/source/HdfsFile.md
index 29092296ea..f1ef0aa877 100644
--- a/docs/en/connector-v2/source/HdfsFile.md
+++ b/docs/en/connector-v2/source/HdfsFile.md
@@ -46,7 +46,7 @@ Read data from hdfs file system.
| fs.defaultFS | string | yes | - | The
hadoop cluster address that start with `hdfs://`, for example:
`hdfs://hadoopcluster`
|
| read_columns | list | yes | - | The
read column list of the data source, user can use it to implement field
projection.The file type supported column projection as the following
shown:[text,json,csv,orc,parquet,excel].Tips: If the user wants to use this
feature when reading `text` `json` `csv` files, the schema option must be
configured. |
| hdfs_site_path | string | no | - | The
path of `hdfs-site.xml`, used to load ha configuration of namenodes
|
-| delimiter | string | no | \001 | Field
delimiter, used to tell connector how to slice and dice fields when reading
text files. default `\001`, the same as hive's default delimiter
|
+| delimiter/field_delimiter | string | no | \001 | Field
delimiter, used to tell connector how to slice and dice fields when reading
text files. default `\001`, the same as hive's default delimiter
|
| parse_partition_from_path | boolean | no | true |
Control whether parse the partition keys and values from file path. For example
if you read a file from path
`hdfs://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every
record data from file will be added these two
fields:[name:tyrantlucifer,age:26].Tips:Do not define partition fields in
schema option. |
| date_format | string | no | yyyy-MM-dd | Date
type format, used to tell connector how to convert string to date, supported as
the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` default
`yyyy-MM-dd`.Date type format, used to tell connector how to convert string to
date, supported as the following formats:`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
default `yyyy-MM-dd` |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
Datetime type format, used to tell connector how to convert string to datetime,
supported as the following formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss`
`yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` .default `yyyy-MM-dd HH:mm:ss`
|
@@ -55,9 +55,13 @@ Read data from hdfs file system.
| kerberos_keytab_path | string | no | - | The
keytab path of kerberos
|
| skip_header_row_number | long | no | 0 | Skip
the first few lines, but only for the txt and csv.For example, set like
following:`skip_header_row_number = 2`.then Seatunnel will skip the first 2
lines from source files
|
| schema | config | no | - | the
schema fields of upstream data
|
-| common-options | | no | - |
Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
|
| sheet_name | string | no | - |
Reader the sheet of the workbook,Only used when file_format is excel.
|
| compress_codec | string | no | none | The
compress codec of files
|
+| common-options | | no | - |
Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
|
+
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
### compress_codec [string]
diff --git a/docs/en/connector-v2/source/LocalFile.md
b/docs/en/connector-v2/source/LocalFile.md
index 981efd395b..f562fd30ae 100644
--- a/docs/en/connector-v2/source/LocalFile.md
+++ b/docs/en/connector-v2/source/LocalFile.md
@@ -2,17 +2,11 @@
> Local file source connector
-## Description
-
-Read data from local file system.
-
-:::tip
-
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+## Support Those Engines
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -33,6 +27,18 @@ Read all the data in a split in a pollNext call. What splits
are read will be sa
- [x] json
- [x] excel
+## Description
+
+Read data from local file system.
+
+:::tip
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+:::
+
## Options
| name | type | required | default value |
@@ -40,76 +46,22 @@ Read all the data in a split in a pollNext call. What
splits are read will be sa
| path | string | yes | - |
| file_format_type | string | yes | - |
| read_columns | list | no | - |
-| delimiter | string | no | \001 |
+| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| skip_header_row_number | long | no | 0 |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### path [string]
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`file://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
### file_format_type [string]
File type, supported as the following file types:
@@ -175,13 +127,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -198,34 +150,81 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |
-### schema [config]
+### read_columns [list]
-#### fields [Config]
+The read column list of the data source, user can use it to implement field
projection.
-The schema information of upstream data.
+### delimiter/field_delimiter [string]
-### read_columns [list]
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
-The read column list of the data source, user can use it to implement field
projection.
+Only need to be configured when file_format is text.
-The file type supported column projection as the following shown:
+Field delimiter, used to tell connector how to slice and dice fields.
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
+default `\001`, the same as hive's default delimiter
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
+### parse_partition_from_path [boolean]
-### common options
+Control whether parse the partition keys and values from file path
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details
+For example if you read a file from path
`file://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+#### fields [Config]
+
+The schema information of upstream data.
### sheet_name [string]
-Reader the sheet of the workbook,Only used when file_format_type is excel.
+Only need to be configured when file_format is excel.
+
+Reader the sheet of the workbook.
### file_filter_pattern [string]
@@ -241,6 +240,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details
+
## Example
```hocon
diff --git a/docs/en/connector-v2/source/OssFile.md
b/docs/en/connector-v2/source/OssFile.md
index c9b44a3e84..2f51024b67 100644
--- a/docs/en/connector-v2/source/OssFile.md
+++ b/docs/en/connector-v2/source/OssFile.md
@@ -2,20 +2,11 @@
> Oss file source connector
-## Description
-
-Read data from aliyun oss file system.
-
-:::tip
-
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
-It only supports hadoop version **2.9.X+**.
+## Support Those Engines
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -36,6 +27,21 @@ Read all the data in a split in a pollNext call. What splits
are read will be sa
- [x] json
- [x] excel
+## Description
+
+Read data from aliyun oss file system.
+
+:::tip
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
+It only supports hadoop version **2.9.X+**.
+
+:::
+
## Options
| name | type | required | default value |
@@ -47,76 +53,22 @@ Read all the data in a split in a pollNext call. What
splits are read will be sa
| access_secret | string | yes | - |
| endpoint | string | yes | - |
| read_columns | list | yes | - |
-| delimiter | string | no | \001 |
+| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| skip_header_row_number | long | no | 0 |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### path [string]
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
### file_format_type [string]
File type, supported as the following file types:
@@ -182,13 +134,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -221,34 +173,85 @@ The access secret of oss file system.
The endpoint of oss file system.
-### schema [config]
+### read_columns [list]
-#### fields [Config]
+The read column list of the data source, user can use it to implement field
projection.
-The schema of upstream data.
+### delimiter/field_delimiter [string]
-### read_columns [list]
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
-The read column list of the data source, user can use it to implement field
projection.
+Only need to be configured when file_format is text.
-The file type supported column projection as the following shown:
+Field delimiter, used to tell connector how to slice and dice fields.
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
+default `\001`, the same as hive's default delimiter
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
+### parse_partition_from_path [boolean]
-### common options
+Control whether parse the partition keys and values from file path
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+For example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+#### fields [Config]
+
+The schema of upstream data.
### sheet_name [string]
-Reader the sheet of the workbook,Only used when file_format_type is excel.
+Only need to be configured when file_format is excel.
+
+Reader the sheet of the workbook.
+
+### file_filter_pattern [string]
+
+Filter pattern, which used for filtering files.
### compress_codec [string]
@@ -260,6 +263,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+
## Example
```hocon
@@ -294,10 +301,6 @@ The compress codec of files and the details that supported
as the following show
```
-### file_filter_pattern [string]
-
-Filter pattern, which used for filtering files.
-
## Changelog
### 2.2.0-beta 2022-09-26
diff --git a/docs/en/connector-v2/source/OssJindoFile.md
b/docs/en/connector-v2/source/OssJindoFile.md
index 3a33088fbe..27b710cfb8 100644
--- a/docs/en/connector-v2/source/OssJindoFile.md
+++ b/docs/en/connector-v2/source/OssJindoFile.md
@@ -2,23 +2,11 @@
> OssJindo file source connector
-## Description
-
-Read data from aliyun oss file system using jindo api.
-
-:::tip
-
-You need to download
[jindosdk-4.6.1.tar.gz](https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/4.6.1/jindosdk-4.6.1.tar.gz)
-and then unzip it, copy jindo-sdk-4.6.1.jar and jindo-core-4.6.1.jar from lib
to ${SEATUNNEL_HOME}/lib.
-
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+## Support Those Engines
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
-It only supports hadoop version **2.9.X+**.
-
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -39,6 +27,24 @@ Read all the data in a split in a pollNext call. What splits
are read will be sa
- [x] json
- [x] excel
+## Description
+
+Read data from aliyun oss file system using jindo api.
+
+:::tip
+
+You need to download
[jindosdk-4.6.1.tar.gz](https://jindodata-binary.oss-cn-shanghai.aliyuncs.com/release/4.6.1/jindosdk-4.6.1.tar.gz)
+and then unzip it, copy jindo-sdk-4.6.1.jar and jindo-core-4.6.1.jar from lib
to ${SEATUNNEL_HOME}/lib.
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+We made some trade-offs in order to support more file types, so we used the
HDFS protocol for internal access to OSS and this connector need some hadoop
dependencies.
+It only supports hadoop version **2.9.X+**.
+
+:::
+
## Options
| name | type | required | default value |
@@ -50,76 +56,22 @@ Read all the data in a split in a pollNext call. What
splits are read will be sa
| access_secret | string | yes | - |
| endpoint | string | yes | - |
| read_columns | list | no | - |
-| delimiter | string | no | \001 |
+| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| skip_header_row_number | long | no | 0 |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### path [string]
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
### file_format_type [string]
File type, supported as the following file types:
@@ -185,13 +137,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -224,34 +176,81 @@ The access secret of oss file system.
The endpoint of oss file system.
-### schema [config]
+### read_columns [list]
-#### fields [Config]
+The read column list of the data source, user can use it to implement field
projection.
-The schema of upstream data.
+### delimiter/field_delimiter [string]
-### read_columns [list]
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
-The read column list of the data source, user can use it to implement field
projection.
+Only need to be configured when file_format is text.
-The file type supported column projection as the following shown:
+Field delimiter, used to tell connector how to slice and dice fields.
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
+default `\001`, the same as hive's default delimiter
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
+### parse_partition_from_path [boolean]
-### common options
+Control whether parse the partition keys and values from file path
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+For example if you read a file from path
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+#### fields [Config]
+
+The schema of upstream data.
### sheet_name [string]
-Reader the sheet of the workbook,Only used when file_format_type is excel.
+Only need to be configured when file_format is excel.
+
+Reader the sheet of the workbook.
### file_filter_pattern [string]
@@ -267,6 +266,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+
## Example
```hocon
diff --git a/docs/en/connector-v2/source/S3File.md
b/docs/en/connector-v2/source/S3File.md
index 9237b92cff..78ae7422ed 100644
--- a/docs/en/connector-v2/source/S3File.md
+++ b/docs/en/connector-v2/source/S3File.md
@@ -111,13 +111,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -205,16 +205,20 @@ If you assign file type to `parquet` `orc`, schema option
not required, connecto
| access_key | string | no | -
| Only used when
`fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider `
[...]
| access_secret | string | no | -
| Only used when
`fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider `
[...]
| hadoop_s3_properties | map | no | -
| If you need to add other option, you could
add it here and refer to this
[link](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
[...]
-| delimiter | string | no | \001
| Field delimiter, used to tell connector how
to slice and dice fields when reading text files. Default `\001`, the same as
hive's default delimiter.
[...]
+| delimiter/field_delimiter | string | no | \001
| Field delimiter, used to tell connector how
to slice and dice fields when reading text files. Default `\001`, the same as
hive's default delimiter.
[...]
| parse_partition_from_path | boolean | no | true
| Control whether parse the partition keys and
values from file path. For example if you read a file from path
`s3n://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`. Every
record data from file will be added these two fields: name="tyrantlucifer",
age=16
[...]
| date_format | string | no | yyyy-MM-dd
| Date type format, used to tell connector how
to convert string to date, supported as the following formats:`yyyy-MM-dd`
`yyyy.MM.dd` `yyyy/MM/dd`. default `yyyy-MM-dd`
[...]
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss
| Datetime type format, used to tell connector
how to convert string to datetime, supported as the following
formats:`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
[...]
| time_format | string | no | HH:mm:ss
| Time type format, used to tell connector how
to convert string to time, supported as the following formats:`HH:mm:ss`
`HH:mm:ss.SSS`
[...]
| skip_header_row_number | long | no | 0
| Skip the first few lines, but only for the
txt and csv. For example, set like following:`skip_header_row_number = 2`. Then
SeaTunnel will skip the first 2 lines from source files
[...]
| schema | config | no | -
| The schema of upstream data.
[...]
-| common-options | | no | -
| Source plugin common parameters, please refer
to [Source Common Options](common-options.md) for details.
[...]
| sheet_name | string | no | -
| Reader the sheet of the workbook,Only used
when file_format is excel.
[...]
| compress_codec | string | no | none
|
+| common-options | | no | -
| Source plugin common parameters, please refer
to [Source Common Options](common-options.md) for details.
[...]
+
+### delimiter/field_delimiter [string]
+
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
### compress_codec [string]
diff --git a/docs/en/connector-v2/source/SftpFile.md
b/docs/en/connector-v2/source/SftpFile.md
index c35fc98d58..05b3bc4f38 100644
--- a/docs/en/connector-v2/source/SftpFile.md
+++ b/docs/en/connector-v2/source/SftpFile.md
@@ -2,17 +2,11 @@
> Sftp file source connector
-## Description
-
-Read data from sftp file server.
-
-:::tip
+## Support Those Engines
-If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
-
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
## Key features
@@ -28,6 +22,18 @@ If you use SeaTunnel Engine, It automatically integrated the
hadoop jar when you
- [x] json
- [x] excel
+## Description
+
+Read data from sftp file server.
+
+:::tip
+
+If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
+
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+
+:::
+
## Options
| name | type | required | default value |
@@ -38,17 +44,19 @@ If you use SeaTunnel Engine, It automatically integrated
the hadoop jar when you
| password | string | yes | - |
| path | string | yes | - |
| file_format_type | string | yes | - |
-| delimiter | string | no | \001 |
+| delimiter/field_delimiter | string | no | \001 |
| parse_partition_from_path | boolean | no | true |
| date_format | string | no | yyyy-MM-dd |
| skip_header_row_number | long | no | 0 |
| datetime_format | string | no | yyyy-MM-dd HH:mm:ss |
| time_format | string | no | HH:mm:ss |
| schema | config | no | - |
-| common-options | | no | - |
| sheet_name | string | no | - |
+| skip_header_row_number | long | no | 0 |
+| read_columns | list | no | - |
| file_filter_pattern | string | no | - |
| compress_codec | string | no | none |
+| common-options | | no | - |
### host [string]
@@ -58,7 +66,7 @@ The target sftp host is required
The target sftp port is required
-### username [string]
+### user [string]
The target sftp username is required
@@ -70,79 +78,6 @@ The target sftp password is required
The source file path.
-### delimiter [string]
-
-Field delimiter, used to tell connector how to slice and dice fields when
reading text files
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path
`sftp://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-| name | age |
-|---------------|-----|
-| tyrantlucifer | 26 |
-
-Tips: **Do not define partition fields in schema option**
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date,
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time,
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
-### schema [config]
-
-The schema information of upstream data.
-
-### read_columns [list]
-
-The read column list of the data source, user can use it to implement field
projection.
-
-The file type supported column projection as the following shown:
-
-- text
-- json
-- csv
-- orc
-- parquet
-- excel
-
-**Tips: If the user wants to use this feature when reading `text` `json` `csv`
files, the schema option must be configured**
-
### file_format_type [string]
File type, supported as the following file types:
@@ -197,13 +132,13 @@ If you do not assign data schema connector will treat the
upstream data as the f
|-----------------------|
| tyrantlucifer#26#male |
-If you assign data schema, you should also assign the option `delimiter` too
except CSV file type
+If you assign data schema, you should also assign the option `field_delimiter`
too except CSV file type
you should assign schema and delimiter as the following:
```hocon
-delimiter = "#"
+field_delimiter = "#"
schema {
fields {
name = string
@@ -220,13 +155,79 @@ connector will generate data as the following:
|---------------|-----|--------|
| tyrantlucifer | 26 | male |
-### common options
+### delimiter/field_delimiter [string]
-Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+**delimiter** parameter will deprecate after version 2.3.5, please use
**field_delimiter** instead.
+
+Only need to be configured when file_format is text.
+
+Field delimiter, used to tell connector how to slice and dice fields.
+
+default `\001`, the same as hive's default delimiter
+
+### parse_partition_from_path [boolean]
+
+Control whether parse the partition keys and values from file path
+
+For example if you read a file from path
`sftp://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
+
+Every record data from file will be added these two fields:
+
+| name | age |
+|---------------|-----|
+| tyrantlucifer | 26 |
+
+Tips: **Do not define partition fields in schema option**
+
+### date_format [string]
+
+Date type format, used to tell connector how to convert string to date,
supported as the following formats:
+
+`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
+
+default `yyyy-MM-dd`
+
+### datetime_format [string]
+
+Datetime type format, used to tell connector how to convert string to
datetime, supported as the following formats:
+
+`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss`
`yyyyMMddHHmmss`
+
+default `yyyy-MM-dd HH:mm:ss`
+
+### time_format [string]
+
+Time type format, used to tell connector how to convert string to time,
supported as the following formats:
+
+`HH:mm:ss` `HH:mm:ss.SSS`
+
+default `HH:mm:ss`
+
+### schema [config]
+
+Only need to be configured when the file_format_type are text, json, excel or
csv ( Or other format we can't read the schema from metadata).
+
+The schema information of upstream data.
### sheet_name [string]
-Reader the sheet of the workbook,Only used when file_format_type is excel.
+Only need to be configured when file_format is excel.
+
+Reader the sheet of the workbook.
+
+### skip_header_row_number [long]
+
+Skip the first few lines, but only for the txt and csv.
+
+For example, set like following:
+
+`skip_header_row_number = 2`
+
+then SeaTunnel will skip the first 2 lines from source files
+
+### read_columns [list]
+
+The read column list of the data source, user can use it to implement field
projection.
### file_filter_pattern [string]
@@ -242,6 +243,10 @@ The compress codec of files and the details that supported
as the following show
- orc/parquet:
automatically recognizes the compression type, no additional settings
required.
+### common options
+
+Source plugin common parameters, please refer to [Source Common
Options](common-options.md) for details.
+
## Example
```hocon
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java
index c9cf94dcb2..6e2818d459 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfig.java
@@ -40,10 +40,11 @@ public class BaseSourceConfig {
.noDefaultValue()
.withDescription("The file path of source files");
- public static final Option<String> DELIMITER =
- Options.key("delimiter")
+ public static final Option<String> FIELD_DELIMITER =
+ Options.key("field_delimiter")
.stringType()
.defaultValue(TextFormatConstant.SEPARATOR[0])
+ .withFallbackKeys("delimiter")
.withDescription(
"The separator between columns in a row of data.
Only needed by `text` file format");
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/ExcelReadStrategy.java
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/ExcelReadStrategy.java
index 0b1cfc083b..3371580c17 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/ExcelReadStrategy.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/ExcelReadStrategy.java
@@ -19,6 +19,7 @@ package
org.apache.seatunnel.connectors.seatunnel.file.source.reader;
import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.seatunnel.api.configuration.ReadonlyConfig;
import org.apache.seatunnel.api.source.Collector;
import org.apache.seatunnel.api.table.type.SeaTunnelDataType;
import org.apache.seatunnel.api.table.type.SeaTunnelRow;
@@ -233,7 +234,9 @@ public class ExcelReadStrategy extends AbstractReadStrategy
{
case BYTES:
return field.toString().getBytes(StandardCharsets.UTF_8);
case ROW:
- String delimiter =
pluginConfig.getString(BaseSourceConfig.DELIMITER.key());
+ String delimiter =
+ ReadonlyConfig.fromConfig(pluginConfig)
+ .get(BaseSourceConfig.FIELD_DELIMITER);
String[] context = field.toString().split(delimiter);
SeaTunnelRowType ft = (SeaTunnelRowType) fieldType;
int length = context.length;
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java
index 51892cf99f..816e50b57b 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/source/reader/TextReadStrategy.java
@@ -18,6 +18,7 @@
package org.apache.seatunnel.connectors.seatunnel.file.source.reader;
import org.apache.seatunnel.api.common.SeaTunnelAPIErrorCode;
+import org.apache.seatunnel.api.configuration.ReadonlyConfig;
import org.apache.seatunnel.api.serialization.DeserializationSchema;
import org.apache.seatunnel.api.source.Collector;
import org.apache.seatunnel.api.table.catalog.CatalogTableUtil;
@@ -49,11 +50,12 @@ import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.util.Map;
+import java.util.Optional;
@Slf4j
public class TextReadStrategy extends AbstractReadStrategy {
private DeserializationSchema<SeaTunnelRow> deserializationSchema;
- private String fieldDelimiter = BaseSourceConfig.DELIMITER.defaultValue();
+ private String fieldDelimiter =
BaseSourceConfig.FIELD_DELIMITER.defaultValue();
private DateUtils.Formatter dateFormat =
BaseSourceConfig.DATE_FORMAT.defaultValue();
private DateTimeUtils.Formatter datetimeFormat =
BaseSourceConfig.DATETIME_FORMAT.defaultValue();
@@ -162,8 +164,11 @@ public class TextReadStrategy extends AbstractReadStrategy
{
public void setSeaTunnelRowTypeInfo(SeaTunnelRowType seaTunnelRowType) {
SeaTunnelRowType userDefinedRowTypeWithPartition =
mergePartitionTypes(fileNames.get(0), seaTunnelRowType);
- if (pluginConfig.hasPath(BaseSourceConfig.DELIMITER.key())) {
- fieldDelimiter =
pluginConfig.getString(BaseSourceConfig.DELIMITER.key());
+ Optional<String> fieldDelimiterOptional =
+ ReadonlyConfig.fromConfig(pluginConfig)
+ .getOptional(BaseSourceConfig.FIELD_DELIMITER);
+ if (fieldDelimiterOptional.isPresent()) {
+ fieldDelimiter = fieldDelimiterOptional.get();
} else {
FileFormat fileFormat =
FileFormat.valueOf(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-cos/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/cos/source/CosFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-cos/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/cos/source/CosFileSourceFactory.java
index 78f262d8a2..b5964eae00 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-cos/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/cos/source/CosFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-cos/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/cos/source/CosFileSourceFactory.java
@@ -50,7 +50,7 @@ public class CosFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSourceFactory.java
index 55d61d7511..ce31ea50b8 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-ftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/ftp/source/FtpFileSourceFactory.java
@@ -50,7 +50,7 @@ public class FtpFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/HdfsFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/HdfsFileSourceFactory.java
index 2697e7dc45..3a894d7241 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/HdfsFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-hadoop/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/hdfs/source/HdfsFileSourceFactory.java
@@ -47,7 +47,7 @@ public class HdfsFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-jindo-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-jindo-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
index 02f2357b66..705f551a1e 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-jindo-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-jindo-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
@@ -50,7 +50,7 @@ public class OssFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSourceFactory.java
index 3c2757eb44..cf34c99a5b 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-local/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/local/source/LocalFileSourceFactory.java
@@ -46,7 +46,7 @@ public class LocalFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
index c01893c1f8..2660e9c02c 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-oss/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/oss/source/OssFileSourceFactory.java
@@ -50,7 +50,7 @@ public class OssFileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSourceFactory.java
index 87a1ae3293..e4e29c140b 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-s3/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/s3/source/S3FileSourceFactory.java
@@ -55,7 +55,7 @@ public class S3FileSourceFactory implements
TableSourceFactory {
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConf.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConf.java
index 68f954ace0..3d2d9b3eff 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConf.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConf.java
@@ -48,9 +48,9 @@ public class SftpConf extends HadoopConf {
String defaultFS = String.format("sftp://%s:%s", host, port);
HadoopConf hadoopConf = new SftpConf(defaultFS);
HashMap<String, String> sftpOptions = new HashMap<>();
- sftpOptions.put("fs.sftp.user." + host,
config.getString(SftpConfig.SFTP_USERNAME.key()));
+ sftpOptions.put("fs.sftp.user." + host,
config.getString(SftpConfig.SFTP_USER.key()));
sftpOptions.put(
- "fs.sftp.password." + host + "." +
config.getString(SftpConfig.SFTP_USERNAME.key()),
+ "fs.sftp.password." + host + "." +
config.getString(SftpConfig.SFTP_USER.key()),
config.getString(SftpConfig.SFTP_PASSWORD.key()));
hadoopConf.setExtraOptions(sftpOptions);
return hadoopConf;
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConfig.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConfig.java
index 5788fc7bac..476e11c6f8 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConfig.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/config/SftpConfig.java
@@ -27,7 +27,7 @@ public class SftpConfig extends BaseSourceConfig {
.stringType()
.noDefaultValue()
.withDescription("SFTP server password");
- public static final Option<String> SFTP_USERNAME =
+ public static final Option<String> SFTP_USER =
Options.key("user")
.stringType()
.noDefaultValue()
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSink.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSink.java
index e0708f4dff..7242a57595 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSink.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSink.java
@@ -47,7 +47,7 @@ public class SftpFileSink extends BaseFileSink {
pluginConfig,
SftpConfig.SFTP_HOST.key(),
SftpConfig.SFTP_PORT.key(),
- SftpConfig.SFTP_USERNAME.key(),
+ SftpConfig.SFTP_USER.key(),
SftpConfig.SFTP_PASSWORD.key());
if (!result.isSuccess()) {
throw new FileConnectorException(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSinkFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSinkFactory.java
index 84bdd3bcd4..bd5e11ffdc 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSinkFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/sink/SftpFileSinkFactory.java
@@ -40,7 +40,7 @@ public class SftpFileSinkFactory implements TableSinkFactory {
.required(SftpConfig.FILE_PATH)
.required(SftpConfig.SFTP_HOST)
.required(SftpConfig.SFTP_PORT)
- .required(SftpConfig.SFTP_USERNAME)
+ .required(SftpConfig.SFTP_USER)
.required(SftpConfig.SFTP_PASSWORD)
.optional(BaseSinkConfig.FILE_FORMAT_TYPE)
.conditional(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSource.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSource.java
index 0d195da073..1223bd5547 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSource.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSource.java
@@ -58,7 +58,7 @@ public class SftpFileSource extends BaseFileSource {
SftpConfig.FILE_FORMAT_TYPE.key(),
SftpConfig.SFTP_HOST.key(),
SftpConfig.SFTP_PORT.key(),
- SftpConfig.SFTP_USERNAME.key(),
+ SftpConfig.SFTP_USER.key(),
SftpConfig.SFTP_PASSWORD.key());
if (!result.isSuccess()) {
throw new FileConnectorException(
diff --git
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSourceFactory.java
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSourceFactory.java
index 6015613222..a07c3f4e83 100644
---
a/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSourceFactory.java
+++
b/seatunnel-connectors-v2/connector-file/connector-file-sftp/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/sftp/source/SftpFileSourceFactory.java
@@ -44,13 +44,13 @@ public class SftpFileSourceFactory implements
TableSourceFactory {
.required(SftpConfig.FILE_PATH)
.required(SftpConfig.SFTP_HOST)
.required(SftpConfig.SFTP_PORT)
- .required(SftpConfig.SFTP_USERNAME)
+ .required(SftpConfig.SFTP_USER)
.required(SftpConfig.SFTP_PASSWORD)
.required(BaseSourceConfig.FILE_FORMAT_TYPE)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
FileFormat.TEXT,
- BaseSourceConfig.DELIMITER)
+ BaseSourceConfig.FIELD_DELIMITER)
.conditional(
BaseSourceConfig.FILE_FORMAT_TYPE,
Arrays.asList(
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-cos-e2e/src/test/resources/excel/cos_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-cos-e2e/src/test/resources/excel/cos_excel_to_assert.conf
index b71709318e..733d393340 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-cos-e2e/src/test/resources/excel/cos_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-cos-e2e/src/test/resources/excel/cos_excel_to_assert.conf
@@ -34,7 +34,7 @@ source {
region = "ap-chengdu"
result_table_name = "fake"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
skip_header_row_number = 1
schema = {
fields {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_projection_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_projection_to_assert.conf
index c271a0486a..1bb6823422 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_projection_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_projection_to_assert.conf
@@ -37,7 +37,7 @@ source {
path = "/tmp/seatunnel/read/excel"
result_table_name = "ftp"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
read_columns = [c_string, c_boolean]
skip_header_row_number = 1
schema = {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_to_assert.conf
index b25e8ab1ac..80ebf1577f 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_excel_to_assert.conf
@@ -37,7 +37,7 @@ source {
path = "/tmp/seatunnel/read/excel"
result_table_name = "ftp"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
skip_header_row_number = 1
schema = {
fields {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_filter_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_filter_excel_to_assert.conf
index 6af42f6f3d..e881c39af0 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_filter_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-ftp-e2e/src/test/resources/excel/ftp_filter_excel_to_assert.conf
@@ -37,7 +37,7 @@ source {
path = "/tmp/seatunnel/read/excel_filter"
result_table_name = "ftp"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
skip_header_row_number = 1
file_filter_pattern = "e2e_filter.*"
schema = {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java
index 8132a9c7f5..bb80160f14 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java
@@ -61,6 +61,11 @@ public class LocalFileIT extends TestSuiteBase {
"/seatunnel/read/text/name=tyrantlucifer/hobby=coding/e2e.txt",
container);
+ ContainerUtil.copyFileIntoContainers(
+ "/text/e2e_delimiter.txt",
+ "/seatunnel/read/text_delimiter/e2e.txt",
+ container);
+
Path txtLzo =
convertToLzoFile(ContainerUtil.getResourcesFile("/text/e2e.txt"));
ContainerUtil.copyFileIntoContainers(
txtLzo, "/seatunnel/read/lzo_text/e2e.txt", container);
@@ -97,6 +102,7 @@ public class LocalFileIT extends TestSuiteBase {
// test write local text file
helper.execute("/text/fake_to_local_file_text.conf");
helper.execute("/text/local_file_text_lzo_to_assert.conf");
+ helper.execute("/text/local_file_delimiter_assert.conf");
// test read skip header
helper.execute("/text/local_file_text_skip_headers.conf");
// test read local text file
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
index df6749f718..65e4424fe0 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
@@ -30,7 +30,7 @@ source {
path = "/seatunnel/read/excel"
result_table_name = "fake"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
read_columns = [c_string, c_boolean]
skip_header_row_number = 1
schema = {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_to_assert.conf
index 1160ac5f25..87a62367fc 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_to_assert.conf
@@ -30,7 +30,7 @@ source {
path = "/seatunnel/read/excel"
result_table_name = "fake"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
skip_header_row_number = 1
schema = {
fields {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_filter_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_filter_excel_to_assert.conf
index 86039b44db..c47c8c5f0d 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_filter_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_filter_excel_to_assert.conf
@@ -30,7 +30,7 @@ source {
path = "/seatunnel/read/excel_filter"
result_table_name = "fake"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
skip_header_row_number = 1
file_filter_pattern = "e2e_filter.*"
schema = {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/e2e_delimiter.txt
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/e2e_delimiter.txt
new file mode 100644
index 0000000000..b87687448c
--- /dev/null
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/e2e_delimiter.txt
@@ -0,0 +1,5 @@
+qwerqwer|1972607327106509113020400507301104442513849629249|qwer|true|108|22432|11383204|723560014108175360|3.1407707E38|1.262116635132156E308|zlmzw|2023-05-25|97236477433882034782.803540569732795689|2023-03-25
04:30:13|qwerqwer1458583961104266156763552401211382922561937221393qwertrue930925142792030530244095935039344647.838737E373.3238256808030654E307Zicjq2023-10-1918739344608215707574.2737367351403166822023-10-07
08:24:27
+qwerqwer|20734545375230101131603368534223532992574063143|qwer|true|99|21567|768189694|8504422836686883840|1.3761162E38|5.460153079423635E307|dkCwG|2023-05-19|83044404421834652395.960138696348105704|2023-03-24
10:48:12|qwerqwer2774295104069855819185865051778415509162817756qwerfalse1619571127265647324402356645454202881.8446726E381.7000909191489263E308cXxQV2023-07-2713431695514477025331.5815661990272672962023-12-22
12:26:16
+qwerqwer|11147903451235598576860383707165213199232994316|qwer|true|49|21122|1110303282|2083282743100007424|1.9729736E38|1.0399541425415623E308|muvcN|2023-08-13|68941603382218317993.487441177291093700|2023-04-06
02:40:57|qwerqwer69745783829424948385550024313502468211004949206qwertrue117227855844811138143962162044856324.844609E374.992962483991954E307pPYZS2023-05-1751345924758748590630.6631664051742477762023-12-10
19:23:26
+qwerqwer|12600145717385486047323762331460409881387559257|qwer|true|54|30782|475296705|6520650210788816896|3.253564E38|1.181636072812166E308|RxBAU|2023-03-14|94882795877228509625.376060071805770292|2023-02-25
15:29:26|qwerqwer17078206571395918506189177703116985975671620089209qwerfalse11415353139002758476082670167752366081.4806856E385.82327433457546E307ppTVu2023-10-2784302780955330822761.6237458260160280852023-08-23
09:26:16
+qwerqwer|10811140972103212018816962034437650301336224152|qwer|true|82|27637|1110251085|806786601324796928|7.711023E37|4.398648945575819E307|kGVbL|2023-04-26|80164231813502964946.202647535547152674|2023-04-15
05:22:59|qwerqwer800727634149093075168463891515323059061714847070qwertrue351280654957024134756885372412119043.0538885E384.631561190310559E306leTTG2023-11-1490016690865756655359.8578360402194859042023-08-23
10:30:18
\ No newline at end of file
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_delimiter_assert.conf
similarity index 87%
copy from
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
copy to
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_delimiter_assert.conf
index df6749f718..f83ed1e4f6 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/excel/local_excel_projection_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_delimiter_assert.conf
@@ -27,12 +27,7 @@ env {
source {
LocalFile {
- path = "/seatunnel/read/excel"
- result_table_name = "fake"
- file_format_type = excel
- delimiter = ;
- read_columns = [c_string, c_boolean]
- skip_header_row_number = 1
+ path = "/seatunnel/read/text_delimiter"
schema = {
fields {
c_map = "map<string, string>"
@@ -67,6 +62,10 @@ source {
}
}
}
+ file_format_type = "text"
+ read_columns = [c_string, c_boolean]
+ delimiter = "\\|"
+ result_table_name = "fake"
}
}
@@ -85,20 +84,21 @@ sink {
field_type = string
field_value = [
{
- rule_type = NOT_NULL
+ equals_to = "qwer"
}
]
},
- {
- field_name = c_boolean
- field_type = boolean
- field_value = [
- {
- rule_type = NOT_NULL
- }
- ]
- }
+ {
+ field_name = c_boolean
+ field_type = boolean
+ field_value = [
+ {
+ equals_to = true
+ }
+ ]
+ }
]
}
}
-}
\ No newline at end of file
+}
+
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_text_lzo_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_text_lzo_to_assert.conf
index 80613ec0fc..eb92936aad 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_text_lzo_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/resources/text/local_file_text_lzo_to_assert.conf
@@ -28,7 +28,6 @@ env {
source {
LocalFile {
path = "/seatunnel/read/lzo_text"
- row_delimiter = "\n"
partition_dir_expression = "${k0}=${v0}"
is_partition_field_write_in_file = true
file_name_expression = "${transactionId}_${now}"
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_projection_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_projection_to_assert.conf
index 356c0a8114..bc55ed12c4 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_projection_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_projection_to_assert.conf
@@ -37,7 +37,7 @@ source {
path = "tmp/seatunnel/read/excel"
result_table_name = "sftp"
file_format_type = excel
- delimiter = ;
+ field_delimiter = ;
read_columns = [c_string, c_boolean]
skip_header_row_number = 1
schema = {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_to_assert.conf
index 0031b32098..606f04ecab 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_excel_to_assert.conf
@@ -37,7 +37,7 @@ source {
port = 22
user = seatunnel
password = pass
- delimiter = ";"
+ field_delimiter = ";"
skip_header_row_number = 1
schema = {
fields {
diff --git
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_filter_excel_to_assert.conf
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_filter_excel_to_assert.conf
index b6cd92f712..6125ac9537 100644
---
a/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_filter_excel_to_assert.conf
+++
b/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-sftp-e2e/src/test/resources/excel/sftp_filter_excel_to_assert.conf
@@ -37,7 +37,7 @@ source {
port = 22
user = seatunnel
password = pass
- delimiter = ";"
+ field_delimiter = ";"
file_filter_pattern = "e2e_filter.*"
skip_header_row_number = 1
schema = {