TaoZex commented on code in PR #5101: URL: https://github.com/apache/seatunnel/pull/5101#discussion_r1267630568
##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] json
- [x] excel
-## Options
-
-| name | type | required |
default value |
remarks |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path | string | yes | -
|
|
-| bucket | string | yes | -
|
|
-| fs.s3a.endpoint | string | yes | -
|
|
-| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider |
|
-| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename | boolean | no | false
| Whether you need custom the filename
|
-| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
-| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
-| file_format_type | string | no | "csv"
|
|
-| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
-| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
-| have_partition | boolean | no | false
| Whether you need processing partitions.
|
-| partition_by | array | no | -
| Only used then have_partition is true
|
-| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then
have_partition is true
|
-| is_partition_field_write_in_file | boolean | no | false
| Only used then have_partition is true
|
-| sink_columns | array | no |
| When this parameter is empty, all fields are
sink columns |
-| is_enable_transaction | boolean | no | true
|
|
-| batch_size | int | no | 1000000
|
|
-| compress_codec | string | no | none
|
|
-| common-options | object | no | -
|
|
-| max_rows_in_memory | int | no | -
| Only used when file_format is excel.
|
-| sheet_name | string | no | Sheet${Random
number} | Only used when file_format is excel.
|
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
-The way to authenticate s3a. We only support
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
-More information about the credential provider you can see [Hadoop AWS
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3 | current |
-### access_key [string]
+## Database Dependency
-The access key of s3 file system. If this parameter is not set, please confirm
that the credential provider chain can be authenticated correctly, you could
check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
-The access secret of s3 file system. If this parameter is not set, please
confirm that the credential provider chain can be authenticated correctly, you
could check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type | Orc Data type |
+|-----------------------|------------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | BYTE |
+| SMALLINT | SHORT |
+| INT | INT |
+| BIGINT | LONG |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP |
+| ROW | STRUCT |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type | Parquet Data type |
+|-----------------------|-----------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | INT_8 |
+| SMALLINT | INT_16 |
+| INT | INT32 |
+| BIGINT | INT64 |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP_MILLIS |
+| ROW | GroupType |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+## Sink Options
+
+
+| name | type | required | default value
| Description
|
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path | string | yes | -
|
|
+| bucket | string | yes | -
|
|
+| fs.s3a.endpoint | string | yes | -
|
|
+| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider`
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| custom_filename | boolean | no | false
| Whether you need custom the filename
|
+| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
+| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
+| file_format_type | string | no | "csv"
|
|
+| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
+| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
+| have_partition | boolean | no | false
| Whether you need processing partitions.
|
+| partition_by | array | no | -
| Only used then have_partition is true
|
+| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then
have_partition is true
|
+| is_partition_field_write_in_file | boolean | no | false
| Only used then have_partition is true
|
Review Comment:
```suggestion
| is_partition_field_write_in_file | boolean | no | false
| Only used when have_partition is true
|
```
##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] json
- [x] excel
-## Options
-
-| name | type | required |
default value |
remarks |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path | string | yes | -
|
|
-| bucket | string | yes | -
|
|
-| fs.s3a.endpoint | string | yes | -
|
|
-| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider |
|
-| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename | boolean | no | false
| Whether you need custom the filename
|
-| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
-| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
-| file_format_type | string | no | "csv"
|
|
-| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
-| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
-| have_partition | boolean | no | false
| Whether you need processing partitions.
|
-| partition_by | array | no | -
| Only used then have_partition is true
|
-| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then
have_partition is true
|
-| is_partition_field_write_in_file | boolean | no | false
| Only used then have_partition is true
|
-| sink_columns | array | no |
| When this parameter is empty, all fields are
sink columns |
-| is_enable_transaction | boolean | no | true
|
|
-| batch_size | int | no | 1000000
|
|
-| compress_codec | string | no | none
|
|
-| common-options | object | no | -
|
|
-| max_rows_in_memory | int | no | -
| Only used when file_format is excel.
|
-| sheet_name | string | no | Sheet${Random
number} | Only used when file_format is excel.
|
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
-The way to authenticate s3a. We only support
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
-More information about the credential provider you can see [Hadoop AWS
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3 | current |
-### access_key [string]
+## Database Dependency
-The access key of s3 file system. If this parameter is not set, please confirm
that the credential provider chain can be authenticated correctly, you could
check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
-The access secret of s3 file system. If this parameter is not set, please
confirm that the credential provider chain can be authenticated correctly, you
could check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type | Orc Data type |
+|-----------------------|------------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | BYTE |
+| SMALLINT | SHORT |
+| INT | INT |
+| BIGINT | LONG |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP |
+| ROW | STRUCT |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type | Parquet Data type |
+|-----------------------|-----------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | INT_8 |
+| SMALLINT | INT_16 |
+| INT | INT32 |
+| BIGINT | INT64 |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP_MILLIS |
+| ROW | GroupType |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+## Sink Options
+
+
+| name | type | required | default value
| Description
|
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path | string | yes | -
|
|
+| bucket | string | yes | -
|
|
+| fs.s3a.endpoint | string | yes | -
|
|
+| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider`
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| custom_filename | boolean | no | false
| Whether you need custom the filename
|
+| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
+| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
+| file_format_type | string | no | "csv"
|
|
+| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
+| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
+| have_partition | boolean | no | false
| Whether you need processing partitions.
|
+| partition_by | array | no | -
| Only used then have_partition is true
|
+| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then
have_partition is true
|
Review Comment:
```suggestion
| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used when
have_partition is true
|
```
##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
- [x] json
- [x] excel
-## Options
-
-| name | type | required |
default value |
remarks |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path | string | yes | -
|
|
-| bucket | string | yes | -
|
|
-| fs.s3a.endpoint | string | yes | -
|
|
-| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider |
|
-| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename | boolean | no | false
| Whether you need custom the filename
|
-| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
-| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
-| file_format_type | string | no | "csv"
|
|
-| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
-| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
-| have_partition | boolean | no | false
| Whether you need processing partitions.
|
-| partition_by | array | no | -
| Only used then have_partition is true
|
-| partition_dir_expression | string | no |
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then
have_partition is true
|
-| is_partition_field_write_in_file | boolean | no | false
| Only used then have_partition is true
|
-| sink_columns | array | no |
| When this parameter is empty, all fields are
sink columns |
-| is_enable_transaction | boolean | no | true
|
|
-| batch_size | int | no | 1000000
|
|
-| compress_codec | string | no | none
|
|
-| common-options | object | no | -
|
|
-| max_rows_in_memory | int | no | -
| Only used when file_format is excel.
|
-| sheet_name | string | no | Sheet${Random
number} | Only used when file_format is excel.
|
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
-The way to authenticate s3a. We only support
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
-More information about the credential provider you can see [Hadoop AWS
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3 | current |
-### access_key [string]
+## Database Dependency
-The access key of s3 file system. If this parameter is not set, please confirm
that the credential provider chain can be authenticated correctly, you could
check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when
you download and install SeaTunnel Engine. You can check the jar package under
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
-The access secret of s3 file system. If this parameter is not set, please
confirm that the credential provider chain can be authenticated correctly, you
could check this
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type | Orc Data type |
+|-----------------------|------------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | BYTE |
+| SMALLINT | SHORT |
+| INT | INT |
+| BIGINT | LONG |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP |
+| ROW | STRUCT |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type | Parquet Data type |
+|-----------------------|-----------------------|
+| STRING | STRING |
+| BOOLEAN | BOOLEAN |
+| TINYINT | INT_8 |
+| SMALLINT | INT_16 |
+| INT | INT32 |
+| BIGINT | INT64 |
+| FLOAT | FLOAT |
+| FLOAT | FLOAT |
+| DOUBLE | DOUBLE |
+| DECIMAL | DECIMAL |
+| BYTES | BINARY |
+| DATE | DATE |
+| TIME <br/> TIMESTAMP | TIMESTAMP_MILLIS |
+| ROW | GroupType |
+| NULL | UNSUPPORTED DATA TYPE |
+| ARRAY | LIST |
+| Map | Map |
+
+## Sink Options
+
+
+| name | type | required | default value
| Description
|
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path | string | yes | -
|
|
+| bucket | string | yes | -
|
|
+| fs.s3a.endpoint | string | yes | -
|
|
+| fs.s3a.aws.credentials.provider | string | yes |
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider`
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| access_secret | string | no | -
| Only used when
fs.s3a.aws.credentials.provider =
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
|
+| custom_filename | boolean | no | false
| Whether you need custom the filename
|
+| file_name_expression | string | no | "${transactionId}"
| Only used when custom_filename is true
|
+| filename_time_format | string | no | "yyyy.MM.dd"
| Only used when custom_filename is true
|
+| file_format_type | string | no | "csv"
|
|
+| field_delimiter | string | no | '\001'
| Only used when file_format is text
|
+| row_delimiter | string | no | "\n"
| Only used when file_format is text
|
+| have_partition | boolean | no | false
| Whether you need processing partitions.
|
+| partition_by | array | no | -
| Only used then have_partition is true
|
Review Comment:
```suggestion
| partition_by | array | no | -
| Only used when have_partition is true
|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
