[GitHub] [seatunnel] TaoZex commented on a diff in pull request #5151: [Docs][Connector-V2][S3File]Reconstruct the S3File connector document

via GitHub Wed, 26 Jul 2023 19:17:31 -0700


TaoZex commented on code in PR #5151:
URL: https://github.com/apache/seatunnel/pull/5151#discussion_r1275654166



##########
docs/en/connector-v2/sink/S3File.md:
##########
@@ -30,89 +22,66 @@ By default, we use 2PC commit to ensure `exactly-once`
   - [x] json
   - [x] excel
 
-## Options
-
-|               name               |  type   | required |                     
default value                     |                                             
   remarks                                                 |
-|----------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
-| path                             | string  | yes      | -                    
                                 |                                              
                                                          |
-| bucket                           | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.endpoint                  | string  | yes      | -                    
                                 |                                              
                                                          |
-| fs.s3a.aws.credentials.provider  | string  | yes      | 
com.amazonaws.auth.InstanceProfileCredentialsProvider |                         
                                                                               |
-| access_key                       | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| access_secret                    | string  | no       | -                    
                                 | Only used when 
fs.s3a.aws.credentials.provider = 
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider |
-| custom_filename                  | boolean | no       | false                
                                 | Whether you need custom the filename         
                                                          |
-| file_name_expression             | string  | no       | "${transactionId}"   
                                 | Only used when custom_filename is true       
                                                          |
-| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                                 | Only used when custom_filename is true       
                                                          |
-| file_format_type                 | string  | no       | "csv"                
                                 |                                              
                                                          |
-| field_delimiter                  | string  | no       | '\001'               
                                 | Only used when file_format is text           
                                                          |
-| row_delimiter                    | string  | no       | "\n"                 
                                 | Only used when file_format is text           
                                                          |
-| have_partition                   | boolean | no       | false                
                                 | Whether you need processing partitions.      
                                                          |
-| partition_by                     | array   | no       | -                    
                                 | Only used then have_partition is true        
                                                          |
-| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/"            | Only used then 
have_partition is true                                                          
        |
-| is_partition_field_write_in_file | boolean | no       | false                
                                 | Only used then have_partition is true        
                                                          |
-| sink_columns                     | array   | no       |                      
                                 | When this parameter is empty, all fields are 
sink columns                                              |
-| is_enable_transaction            | boolean | no       | true                 
                                 |                                              
                                                          |
-| batch_size                       | int     | no       | 1000000              
                                 |                                              
                                                          |
-| compress_codec                   | string  | no       | none                 
                                 |                                              
                                                          |
-| common-options                   | object  | no       | -                    
                                 |                                              
                                                          |
-| max_rows_in_memory               | int     | no       | -                    
                                 | Only used when file_format is excel.         
                                                          |
-| sheet_name                       | string  | no       | Sheet${Random 
number}                                 | Only used when file_format is excel.  
                                                                 |
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of s3 file system, for example: `s3n://seatunnel-test`, if 
you use `s3a` protocol, this parameter should be `s3a://seatunnel-test`.
-
-### fs.s3a.endpoint [string]
-
-fs s3a endpoint
-
-### fs.s3a.aws.credentials.provider [string]
-
-The way to authenticate s3a. We only support 
`org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` and 
`com.amazonaws.auth.InstanceProfileCredentialsProvider` now.
-
-More information about the credential provider you can see [Hadoop AWS 
Document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Simple_name.2Fsecret_credentials_with_SimpleAWSCredentialsProvider.2A)
+## Description
 
-### access_key [string]
+Output data to aws s3 file system.
 
-The access key of s3 file system. If this parameter is not set, please confirm 
that the credential provider chain can be authenticated correctly, you could 
check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+## Supported DataSource Info
 
-### access_secret [string]
+In order to use the S3File connector, the following dependencies are required.
+They can be downloaded via install-plugin.sh or from the Maven central 
repository.
 
-The access secret of s3 file system. If this parameter is not set, please 
confirm that the credential provider chain can be authenticated correctly, you 
could check this 
[hadoop-aws](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+| Datasource | Supported Versions |                                            
        Dependency                                                    |
+|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
+| S3File | universal          | 
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/connector-file-s3)
 |
 
-### hadoop_s3_properties [map]
+:::tip
 
-If you need to add a other option, you could add it here and refer to this 
[link](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
 
-```
-hadoop_s3_properties {
-      "fs.s3a.buffer.dir" = "/data/st_test/s3a"
-      "fs.s3a.fast.upload.buffer" = "disk"
-   }
-```
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
 
-### custom_filename [boolean]
+To use this connector you need put hadoop-aws-3.1.4.jar and 
aws-java-sdk-bundle-1.11.271.jar in ${SEATUNNEL_HOME}/lib dir.
 
-Whether custom the filename
+:::
 
-### file_name_expression [string]
+## Data Type Mapping
 
-Only used when `custom_filename` is `true`
+SeaTunnel will write the data into the file in String format according to the 
SeaTunnel data type and file_format_type,

Review Comment:
   ```suggestion
   SeaTunnel will write the data into the file in String format according to 
the SeaTunnel data type and file_format_type.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [seatunnel] TaoZex commented on a diff in pull request #5151: [Docs][Connector-V2][S3File]Reconstruct the S3File connector document

Reply via email to