[GitHub] [seatunnel] TaoZex commented on a diff in pull request #5101: [Doc] Improve S3File Source & S3File Sink document

via GitHub Wed, 19 Jul 2023 00:16:51 -0700


TaoZex commented on code in PR #5101:
URL: https://github.com/apache/seatunnel/pull/5101#discussion_r1267630568



##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
   - [x] json
   - [x] excel
 
-## Options
-
-|               name               |  type   | required |                     
default value                     |                                             
   remarks                                                 |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path                             | string  | yes      | -                    
                                 |                                              
                                                          |
-| bucket                           | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider |                         
                                                                               |
-| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                          |
-| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                          |
-| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                          |
-| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                          |
-| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                          |
-| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                          |
-| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                          |
-| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                          |
-| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
        |
-| is_partition_field_write_in_file | boolean | no       | false                
                                 | Only used then have_partition is true        
                                                          |
-| sink_columns                     | array   | no       |                      
                                 | When this parameter is empty, all fields are 
sink columns                                              |
-| is_enable_transaction            | boolean | no       | true                 
                                 |                                              
                                                          |
-| batch_size                       | int     | no       | 1000000              
                                 |                                              
                                                          |
-| compress_codec                   | string  | no       | none                 
                                 |                                              
                                                          |
-| common-options                   | object  | no       | -                    
                                 |                                              
                                                          |
-| max_rows_in_memory               | int     | no       | -                    
                                 | Only used when file_format is excel.         
                                                          |
-| sheet_name                       | string  | no       | Sheet${Random 
number}                                 | Only used when file_format is excel.  
                                                                 |
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if 
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
 
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
 
-The way to authenticate s3a. We only support 
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and 
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
 
-More information about the credential provider you can see [Hadoop AWS 
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3         | current            |
 
-### access_key [string]
+## Database Dependency
 
-The access key of s3 file system. If this parameter is not set, please confirm 
that the credential provider chain can be authenticated correctly, you could 
check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
 
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and 
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
 
-The access secret of s3 file system. If this parameter is not set, please 
confirm that the credential provider chain can be authenticated correctly, you 
could check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
 
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type   | Orc Data type          |
+|-----------------------|------------------------|
+| STRING                | STRING                 |
+| BOOLEAN               | BOOLEAN                |
+| TINYINT               | BYTE                   |
+| SMALLINT              | SHORT                  |
+| INT                   | INT                    |
+| BIGINT                | LONG                   |
+| FLOAT                 | FLOAT                  |
+| FLOAT                 | FLOAT                  |
+| DOUBLE                | DOUBLE                 |
+| DECIMAL               | DECIMAL                |
+| BYTES                 | BINARY                 |
+| DATE                  | DATE                   |
+| TIME <br/> TIMESTAMP  | TIMESTAMP              |
+| ROW                   | STRUCT                 |
+| NULL                  | UNSUPPORTED DATA TYPE  |
+| ARRAY                 | LIST                   |
+| Map                   | Map                    |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type   | Parquet Data type     |
+|-----------------------|-----------------------|
+| STRING                | STRING                |
+| BOOLEAN               | BOOLEAN               |
+| TINYINT               | INT_8                 |
+| SMALLINT              | INT_16                |
+| INT                   | INT32                 |
+| BIGINT                | INT64                 |
+| FLOAT                 | FLOAT                 |
+| FLOAT                 | FLOAT                 |
+| DOUBLE                | DOUBLE                |
+| DECIMAL               | DECIMAL               |
+| BYTES                 | BINARY                |
+| DATE                  | DATE                  |
+| TIME <br/> TIMESTAMP  | TIMESTAMP_MILLIS      |
+| ROW                   | GroupType             |
+| NULL                  | UNSUPPORTED DATA TYPE |
+| ARRAY                 | LIST                  |
+| Map                   | Map                   |
+
+## Sink Options
+
+
+| name                             | type    | required | default value        
                                 | Description                                  
                                                                                
                                         |
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path                             | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| bucket                           | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate 
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` 
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                                                
                                         |
+| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                                                
                                         |
+| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                                                
                                         |
+| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                                                
                                         |
+| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                                                
                                         |
+| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                                                
                                         |
+| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
                                                                       |
+| is_partition_field_write_in_file | boolean | no       | false                
                                 | Only used then have_partition is true        
                                                                                
                                         |

Review Comment:
   ```suggestion
   | is_partition_field_write_in_file | boolean | no       | false              
                                   | Only used when have_partition is true      
                                                                                
                                           |
   ```



##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
   - [x] json
   - [x] excel
 
-## Options
-
-|               name               |  type   | required |                     
default value                     |                                             
   remarks                                                 |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path                             | string  | yes      | -                    
                                 |                                              
                                                          |
-| bucket                           | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider |                         
                                                                               |
-| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                          |
-| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                          |
-| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                          |
-| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                          |
-| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                          |
-| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                          |
-| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                          |
-| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                          |
-| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
        |
-| is_partition_field_write_in_file | boolean | no       | false                
                                 | Only used then have_partition is true        
                                                          |
-| sink_columns                     | array   | no       |                      
                                 | When this parameter is empty, all fields are 
sink columns                                              |
-| is_enable_transaction            | boolean | no       | true                 
                                 |                                              
                                                          |
-| batch_size                       | int     | no       | 1000000              
                                 |                                              
                                                          |
-| compress_codec                   | string  | no       | none                 
                                 |                                              
                                                          |
-| common-options                   | object  | no       | -                    
                                 |                                              
                                                          |
-| max_rows_in_memory               | int     | no       | -                    
                                 | Only used when file_format is excel.         
                                                          |
-| sheet_name                       | string  | no       | Sheet${Random 
number}                                 | Only used when file_format is excel.  
                                                                 |
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if 
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
 
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
 
-The way to authenticate s3a. We only support 
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and 
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
 
-More information about the credential provider you can see [Hadoop AWS 
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3         | current            |
 
-### access_key [string]
+## Database Dependency
 
-The access key of s3 file system. If this parameter is not set, please confirm 
that the credential provider chain can be authenticated correctly, you could 
check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
 
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and 
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
 
-The access secret of s3 file system. If this parameter is not set, please 
confirm that the credential provider chain can be authenticated correctly, you 
could check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
 
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type   | Orc Data type          |
+|-----------------------|------------------------|
+| STRING                | STRING                 |
+| BOOLEAN               | BOOLEAN                |
+| TINYINT               | BYTE                   |
+| SMALLINT              | SHORT                  |
+| INT                   | INT                    |
+| BIGINT                | LONG                   |
+| FLOAT                 | FLOAT                  |
+| FLOAT                 | FLOAT                  |
+| DOUBLE                | DOUBLE                 |
+| DECIMAL               | DECIMAL                |
+| BYTES                 | BINARY                 |
+| DATE                  | DATE                   |
+| TIME <br/> TIMESTAMP  | TIMESTAMP              |
+| ROW                   | STRUCT                 |
+| NULL                  | UNSUPPORTED DATA TYPE  |
+| ARRAY                 | LIST                   |
+| Map                   | Map                    |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type   | Parquet Data type     |
+|-----------------------|-----------------------|
+| STRING                | STRING                |
+| BOOLEAN               | BOOLEAN               |
+| TINYINT               | INT_8                 |
+| SMALLINT              | INT_16                |
+| INT                   | INT32                 |
+| BIGINT                | INT64                 |
+| FLOAT                 | FLOAT                 |
+| FLOAT                 | FLOAT                 |
+| DOUBLE                | DOUBLE                |
+| DECIMAL               | DECIMAL               |
+| BYTES                 | BINARY                |
+| DATE                  | DATE                  |
+| TIME <br/> TIMESTAMP  | TIMESTAMP_MILLIS      |
+| ROW                   | GroupType             |
+| NULL                  | UNSUPPORTED DATA TYPE |
+| ARRAY                 | LIST                  |
+| Map                   | Map                   |
+
+## Sink Options
+
+
+| name                             | type    | required | default value        
                                 | Description                                  
                                                                                
                                         |
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path                             | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| bucket                           | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate 
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` 
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                                                
                                         |
+| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                                                
                                         |
+| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                                                
                                         |
+| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                                                
                                         |
+| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                                                
                                         |
+| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                                                
                                         |
+| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
                                                                       |

Review Comment:
   ```suggestion
   | partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used when 
have_partition is true                                                          
                                                                       |
   ```



##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,60 +23,106 @@ By default, we use 2PC commit to ensure `exactly-once`
   - [x] json
   - [x] excel
 
-## Options
-
-|               name               |  type   | required |                     
default value                     |                                             
   remarks                                                 |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path                             | string  | yes      | -                    
                                 |                                              
                                                          |
-| bucket                           | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider |                         
                                                                               |
-| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                          |
-| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                          |
-| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                          |
-| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                          |
-| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                          |
-| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                          |
-| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                          |
-| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                          |
-| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
        |
-| is_partition_field_write_in_file | boolean | no       | false                
                                 | Only used then have_partition is true        
                                                          |
-| sink_columns                     | array   | no       |                      
                                 | When this parameter is empty, all fields are 
sink columns                                              |
-| is_enable_transaction            | boolean | no       | true                 
                                 |                                              
                                                          |
-| batch_size                       | int     | no       | 1000000              
                                 |                                              
                                                          |
-| compress_codec                   | string  | no       | none                 
                                 |                                              
                                                          |
-| common-options                   | object  | no       | -                    
                                 |                                              
                                                          |
-| max_rows_in_memory               | int     | no       | -                    
                                 | Only used when file_format is excel.         
                                                          |
-| sheet_name                       | string  | no       | Sheet${Random 
number}                                 | Only used when file_format is excel.  
                                                                 |
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if 
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
+## Description
 
-### fs.s3a.aws.credentials.provider [string]
+Output data to aws s3 file system.
 
-The way to authenticate s3a. We only support 
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and 
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
+## Supported DataSource Info
 
-More information about the credential provider you can see [Hadoop AWS 
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+| Datasource | Supported Versions |
+|------------|--------------------|
+| S3         | current            |
 
-### access_key [string]
+## Database Dependency
 
-The access key of s3 file system. If this parameter is not set, please confirm 
that the credential provider chain can be authenticated correctly, you could 
check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+> If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
 
-### access_secret [string]
+> If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
+To use this connector you need put hadoop-aws-3.1.4.jar and 
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
 
-The access secret of s3 file system. If this parameter is not set, please 
confirm that the credential provider chain can be authenticated correctly, you 
could check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
 
+## Data Type Mapping
+
+If write to `csv`, `text` file type, All column will be string.
+
+### Orc File Type
+
+
+| SeaTunnel Data type   | Orc Data type          |
+|-----------------------|------------------------|
+| STRING                | STRING                 |
+| BOOLEAN               | BOOLEAN                |
+| TINYINT               | BYTE                   |
+| SMALLINT              | SHORT                  |
+| INT                   | INT                    |
+| BIGINT                | LONG                   |
+| FLOAT                 | FLOAT                  |
+| FLOAT                 | FLOAT                  |
+| DOUBLE                | DOUBLE                 |
+| DECIMAL               | DECIMAL                |
+| BYTES                 | BINARY                 |
+| DATE                  | DATE                   |
+| TIME <br/> TIMESTAMP  | TIMESTAMP              |
+| ROW                   | STRUCT                 |
+| NULL                  | UNSUPPORTED DATA TYPE  |
+| ARRAY                 | LIST                   |
+| Map                   | Map                    |
+
+
+### Parquet File Type
+
+
+| SeaTunnel Data type   | Parquet Data type     |
+|-----------------------|-----------------------|
+| STRING                | STRING                |
+| BOOLEAN               | BOOLEAN               |
+| TINYINT               | INT_8                 |
+| SMALLINT              | INT_16                |
+| INT                   | INT32                 |
+| BIGINT                | INT64                 |
+| FLOAT                 | FLOAT                 |
+| FLOAT                 | FLOAT                 |
+| DOUBLE                | DOUBLE                |
+| DECIMAL               | DECIMAL               |
+| BYTES                 | BINARY                |
+| DATE                  | DATE                  |
+| TIME <br/> TIMESTAMP  | TIMESTAMP_MILLIS      |
+| ROW                   | GroupType             |
+| NULL                  | UNSUPPORTED DATA TYPE |
+| ARRAY                 | LIST                  |
+| Map                   | Map                   |
+
+## Sink Options
+
+
+| name                             | type    | required | default value        
                                 | Description                                  
                                                                                
                                         |
+|----------------------------------|---------|----------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path                             | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| bucket                           | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                                                
                                         |
+| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider | The way to authenticate 
s3a. We only support `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` 
and `com.amazonaws.auth.InstanceProfileCredentialsProvider` now. |
+| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider                           
                                     |
+| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                                                
                                         |
+| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                                                
                                         |
+| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                                                
                                         |
+| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                                                
                                         |
+| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                                                
                                         |
+| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                                                
                                         |
+| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                                                
                                         |

Review Comment:
   ```suggestion
   | partition_by                     | array   | no       | -                  
                                   | Only used when have_partition is true      
                                                                                
                                           |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [seatunnel] TaoZex commented on a diff in pull request #5101: [Doc] Improve S3File Source & S3File Sink document

Reply via email to