(seatunnel) branch dev updated: [Docs][Connector-V2][Oss]Reconstruct the OssFile connector document (#5233)

wanghailin Mon, 04 Dec 2023 19:14:37 -0800

This is an automated email from the ASF dual-hosted git repository.

wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new f4ea676a20 [Docs][Connector-V2][Oss]Reconstruct the OssFile connector 
document (#5233)
f4ea676a20 is described below

commit f4ea676a20c6e20d5bc3cbd9e0335e4d9779d8fc
Author: Jia Fan <[email protected]>
AuthorDate: Tue Dec 5 11:14:25 2023 +0800

    [Docs][Connector-V2][Oss]Reconstruct the OssFile connector document (#5233)
---
 docs/en/connector-v2/sink/OssFile.md   | 285 +++++++++++++--------------------
 docs/en/connector-v2/source/OssFile.md | 225 ++++++++++----------------
 2 files changed, 200 insertions(+), 310 deletions(-)

diff --git a/docs/en/connector-v2/sink/OssFile.md 
b/docs/en/connector-v2/sink/OssFile.md
index c723d4a836..3604748477 100644
--- a/docs/en/connector-v2/sink/OssFile.md
+++ b/docs/en/connector-v2/sink/OssFile.md
@@ -2,20 +2,11 @@
 
 > Oss file sink connector
 
-## Description
-
-Output data to oss file system.
-
-:::tip
-
-If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
-
-If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
-
-We made some trade-offs in order to support more file types, so we used the 
HDFS protocol for internal access to OSS and this connector need some hadoop 
dependencies.
-It only supports hadoop version **2.9.X+**.
+## Support Those Engines
 
-:::
+> Spark<br/>
+> Flink<br/>
+> SeaTunnel Zeta<br/>
 
 ## Key features
 
@@ -31,72 +22,67 @@ By default, we use 2PC commit to ensure `exactly-once`
   - [x] json
   - [x] excel
 
-## Options
-
-|               name               |  type   | required |               
default value                |                                                  
    remarks                                                      |
-|----------------------------------|---------|----------|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
-| path                             | string  | yes      | -                    
                      |                                                         
                                                          |
-| tmp_path                         | string  | no       | /tmp/seatunnel       
                      | The result file will write to a tmp path first and then 
use `mv` to submit tmp dir to target dir. Need a OSS dir. |
-| bucket                           | string  | yes      | -                    
                      |                                                         
                                                          |
-| access_key                       | string  | yes      | -                    
                      |                                                         
                                                          |
-| access_secret                    | string  | yes      | -                    
                      |                                                         
                                                          |
-| endpoint                         | string  | yes      | -                    
                      |                                                         
                                                          |
-| custom_filename                  | boolean | no       | false                
                      | Whether you need custom the filename                    
                                                          |
-| file_name_expression             | string  | no       | "${transactionId}"   
                      | Only used when custom_filename is true                  
                                                          |
-| filename_time_format             | string  | no       | "yyyy.MM.dd"         
                      | Only used when custom_filename is true                  
                                                          |
-| file_format_type                 | string  | no       | "csv"                
                      |                                                         
                                                          |
-| field_delimiter                  | string  | no       | '\001'               
                      | Only used when file_format_type is text                 
                                                          |
-| row_delimiter                    | string  | no       | "\n"                 
                      | Only used when file_format_type is text                 
                                                          |
-| have_partition                   | boolean | no       | false                
                      | Whether you need processing partitions.                 
                                                          |
-| partition_by                     | array   | no       | -                    
                      | Only used then have_partition is true                   
                                                          |
-| partition_dir_expression         | string  | no       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is 
true                                                                            
 |
-| is_partition_field_write_in_file | boolean | no       | false                
                      | Only used then have_partition is true                   
                                                          |
-| sink_columns                     | array   | no       |                      
                      | When this parameter is empty, all fields are sink 
columns                                                         |
-| is_enable_transaction            | boolean | no       | true                 
                      |                                                         
                                                          |
-| batch_size                       | int     | no       | 1000000              
                      |                                                         
                                                          |
-| compress_codec                   | string  | no       | none                 
                      |                                                         
                                                          |
-| common-options                   | object  | no       | -                    
                      |                                                         
                                                          |
-| max_rows_in_memory               | int     | no       | -                    
                      | Only used when file_format_type is excel.               
                                                          |
-| sheet_name                       | string  | no       | Sheet${Random 
number}                      | Only used when file_format_type is excel.        
                                                                 |
-
-### path [string]
-
-The target dir path is required.
-
-### bucket [string]
-
-The bucket address of oss file system, for example: 
`oss://tyrantlucifer-image-bed`
+## Description
 
-### access_key [string]
+Output data to oss file system.
 
-The access key of oss file system.
+## Supported DataSource Info
 
-### access_secret [string]
+In order to use the OssFile connector, the following dependencies are required.
+They can be downloaded via install-plugin.sh or from the Maven central 
repository.
 
-The access secret of oss file system.
+| Datasource | Supported Versions |                                       
Dependency                                       |
+|------------|--------------------|----------------------------------------------------------------------------------------|
+| OssFile    | universal          | 
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/connector-file-oss)
 |
 
-### endpoint [string]
+:::tip
 
-The endpoint of oss file system.
+If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
 
-### custom_filename [boolean]
+If you use SeaTunnel Engine, It automatically integrated the hadoop jar when 
you download and install SeaTunnel Engine. You can check the jar package under 
${SEATUNNEL_HOME}/lib to confirm this.
 
-Whether custom the filename
+We made some trade-offs in order to support more file types, so we used the 
HDFS protocol for internal access to OSS and this connector need some hadoop 
dependencies.
+It only supports hadoop version **2.9.X+**.
 
-### file_name_expression [string]
+:::
 
-Only used when `custom_filename` is `true`
+## Data Type Mapping
 
-`file_name_expression` describes the file expression which will be created 
into the `path`. We can add the variable `${now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${now}`,
-`${now}` represents the current time, and its format can be defined by 
specifying the option `filename_time_format`.
+SeaTunnel will write the data into the file in String format according to the 
SeaTunnel data type and file_format_type.
 
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
+## Options
 
-### filename_time_format [string]
+|               Name               |  Type   | Required |               
Default value                |                                                  
                                                                                
                                                                                
                            Description                                         
                                                                                
                     [...]
+|----------------------------------|---------|----------|--------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| path                             | String  | Yes      | -                    
                      | The target dir path is required.                        
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| tmp_path                         | string  | no       | /tmp/seatunnel       
                      | The result file will write to a tmp path first and then 
use `mv` to submit tmp dir to target dir. Need a OSS dir.                       
                                                                                
                                                                                
                                                                                
              [...]
+| bucket                           | String  | Yes      | -                    
                      | The bucket address of oss file system, for example: 
`oss://tyrantlucifer-image-bed`                                                 
                                                                                
                                                                                
                                                                                
                  [...]
+| access_key                       | String  | No       | -                    
                      | The access key of oss file system.                      
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| access_secret                    | String  | No       | -                    
                      | The access secret of oss file system.                   
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| endpoint                         | String  | Yes      | -                    
                      | The endpoint of oss file system.                        
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| custom_filename                  | Boolean | No       | false                
                      | Whether you need custom the filename                    
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| file_name_expression             | String  | No       | "${transactionId}"   
                      | Only used when `custom_filename` is `true`. <br/> 
`file_name_expression` describes the file expression which will be created into 
the `path`. We can add the variable `${Now}` or `${uuid}` in the 
`file_name_expression`, like `test_${uuid}_${Now}`, `${Now}` represents the 
current time, and its format can be defined by specifying the option 
`filename_time_format`. <br/>Please Note that, If [...]
+| filename_time_format             | String  | No       | "yyyy.MM.dd"         
                      | Please check #filename_time_format below                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| file_format_type                 | String  | No       | "csv"                
                      | We supported as the following file types: <br/> `text` 
`json` `csv` `orc` `parquet` `excel` <br/> Please Note that, The final file 
name will end with the file_format's suffix, the suffix of the text file is 
`txt`.                                                                          
                                                                                
                       [...]
+| field_delimiter                  | String  | No       | '\001'               
                      | The separator between columns in a row of data. Only 
needed by `text` file format.                                                   
                                                                                
                                                                                
                                                                                
                 [...]
+| row_delimiter                    | String  | No       | "\n"                 
                      | The separator between rows in a file. Only needed by 
`text` file format.                                                             
                                                                                
                                                                                
                                                                                
                 [...]
+| have_partition                   | Boolean | No       | false                
                      | Whether you need processing partitions.                 
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| partition_by                     | Array   | No       | -                    
                      | Only used when `have_partition` is `true`. <br/> 
Partition data based on selected fields.                                        
                                                                                
                                                                                
                                                                                
                     [...]
+| partition_dir_expression         | String  | No       | 
"${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used when `have_partition` is 
`true`. <br/> If the `partition_by` is specified, we will generate the 
corresponding partition directory based on the partition information, and the 
final file will be placed in the partition directory. <br/> Default 
`partition_dir_expression` is `${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` 
is the first partition field and `v0` is the value of the  [...]
+| is_partition_field_write_in_file | Boolean | No       | false                
                      | Only used when `have_partition` is `true`. <br/> If 
`is_partition_field_write_in_file` is `true`, the partition field and the value 
of it will be write into data file. <br/> For example, if you want to write a 
Hive Data File, Its value should be `false`.                                    
                                                                                
                    [...]
+| sink_columns                     | Array   | No       |                      
                      | Which columns need be written to file, default value is 
all the columns get from `Transform` or `Source`. <br/> The order of the fields 
determines the order in which the file is actually written.                     
                                                                                
                                                                                
              [...]
+| is_enable_transaction            | Boolean | No       | true                 
                      | If `is_enable_transaction` is true, we will ensure that 
data will Not be lost or duplicated when it is written to the target directory. 
<br/> Please Note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file. <br/> Only support `true` Now.     
                                                                                
               [...]
+| batch_size                       | Int     | No       | 1000000              
                      | The maximum number of rows in a file. For SeaTunnel 
Engine, the number of lines in the file is determined by `batch_size` and 
`checkpoint.interval` jointly decide. If the value of `checkpoint.interval` is 
large eNough, sink writer will write rows in a file until the rows in the file 
larger than `batch_size`. If `checkpoint.interval` is small, the sink writer 
will create a new file when  [...]
+| compress_codec                   | String  | No       | None                 
                      | The compress codec of files and the details that 
supported as the following shown: <br/> - txt: `lzo` `None` <br/> - json: `lzo` 
`None` <br/> - csv: `lzo` `None` <br/> - orc: `lzo` `snappy` `lz4` `zlib` 
`None` <br/> - parquet: `lzo` `snappy` `lz4` `gzip` `brotli` `zstd` `None` 
<br/> Tips: excel type does Not support any compression format                  
                                [...]
+| max_rows_in_memory               | Int     | No       | -                    
                      | When File Format is Excel,The maximum number of data 
items that can be cached in the memory.                                         
                                                                                
                                                                                
                                                                                
                 [...]
+| sheet_name                       | String  | No       | Sheet${Random 
number}                      | Writer the sheet of the workbook                 
                                                                                
                                                                                
                                                                                
                                                                                
                     [...]
+| common-options                   | Config  | No       | -                    
                      | Sink plugin common parameters, please refer to [Sink 
Common Options](common-options.md) for details.                                 
                                                                                
                                                                                
                                                                                
                 [...]
+
+### filename_time_format [String]
 
 Only used when `custom_filename` is `true`
 
-When the format in the `file_name_expression` parameter is `xxxx-${now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
+When the format in the `file_name_expression` parameter is `xxxx-${Now}` , 
`filename_time_format` can specify the time format of the path, and the default 
value is `yyyy.MM.dd` . The commonly used time formats are listed as follows:
 
 | Symbol |    Description     |
 |--------|--------------------|
@@ -107,95 +93,33 @@ When the format in the `file_name_expression` parameter is 
`xxxx-${now}` , `file
 | m      | Minute in hour     |
 | s      | Second in minute   |
 
-### file_format_type [string]
-
-We supported as the following file types:
-
-`text` `json` `csv` `orc` `parquet` `excel`
-
-Please note that, The final file name will end with the file_format_type's 
suffix, the suffix of the text file is `txt`.
-
-### field_delimiter [string]
-
-The separator between columns in a row of data. Only needed by `text` file 
format.
-
-### row_delimiter [string]
-
-The separator between rows in a file. Only needed by `text` file format.
-
-### have_partition [boolean]
-
-Whether you need processing partitions.
-
-### partition_by [array]
-
-Only used when `have_partition` is `true`.
+## How to Create a Oss Data Synchronization Jobs
 
-Partition data based on selected fields.
-
-### partition_dir_expression [string]
-
-Only used when `have_partition` is `true`.
-
-If the `partition_by` is specified, we will generate the corresponding 
partition directory based on the partition information, and the final file will 
be placed in the partition directory.
-
-Default `partition_dir_expression` is 
`${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/`. `k0` is the first partition field 
and `v0` is the value of the first partition field.
-
-### is_partition_field_write_in_file [boolean]
-
-Only used when `have_partition` is `true`.
-
-If `is_partition_field_write_in_file` is `true`, the partition field and the 
value of it will be write into data file.
-
-For example, if you want to write a Hive Data File, Its value should be 
`false`.
-
-### sink_columns [array]
-
-Which columns need be written to file, default value is all the columns get 
from `Transform` or `Source`.
-The order of the fields determines the order in which the file is actually 
written.
-
-### is_enable_transaction [boolean]
-
-If `is_enable_transaction` is true, we will ensure that data will not be lost 
or duplicated when it is written to the target directory.
-
-Please note that, If `is_enable_transaction` is `true`, we will auto add 
`${transactionId}_` in the head of the file.
-
-Only support `true` now.
-
-### batch_size [int]
-
-The maximum number of rows in a file. For SeaTunnel Engine, the number of 
lines in the file is determined by `batch_size` and `checkpoint.interval` 
jointly decide. If the value of `checkpoint.interval` is large enough, sink 
writer will write rows in a file until the rows in the file larger than 
`batch_size`. If `checkpoint.interval` is small, the sink writer will create a 
new file when a new checkpoint trigger.
-
-### compress_codec [string]
-
-The compress codec of files and the details that supported as the following 
shown:
-
-- txt: `lzo` `none`
-- json: `lzo` `none`
-- csv: `lzo` `none`
-- orc: `lzo` `snappy` `lz4` `zlib` `none`
-- parquet: `lzo` `snappy` `lz4` `gzip` `brotli` `zstd` `none`
-
-Tips: excel type does not support any compression format
-
-### common options
-
-Sink plugin common parameters, please refer to [Sink Common 
Options](common-options.md) for details.
-
-### max_rows_in_memory [int]
-
-When File Format is Excel,The maximum number of data items that can be cached 
in the memory.
-
-### sheet_name [string]
-
-Writer the sheet of the workbook
-
-## Example
+The following example demonstrates how to create a data synchronization job 
that reads data from Fake Source and writes it to the Oss:
 
 For text file format with `have_partition` and `custom_filename` and 
`sink_columns`
 
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+  FakeSource {
+    schema = {
+      fields {
+        name = string
+        age = int
+      }
+    }
+  }
+}
 
+# write data to Oss
+sink {
   OssFile {
     path="/seatunnel/sink"
     bucket = "oss://tyrantlucifer-image-bed"
@@ -215,13 +139,32 @@ For text file format with `have_partition` and 
`custom_filename` and `sink_colum
     sink_columns = ["name","age"]
     is_enable_transaction = true
   }
-
+}
 ```
 
 For parquet file format with `have_partition` and `sink_columns`
 
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+  FakeSource {
+    schema = {
+      fields {
+        name = string
+        age = int
+      }
+    }
+  }
+}
 
+# Write data to Oss
+sink {
   OssFile {
     path = "/seatunnel/sink"
     bucket = "oss://tyrantlucifer-image-bed"
@@ -235,13 +178,32 @@ For parquet file format with `have_partition` and 
`sink_columns`
     file_format_type = "parquet"
     sink_columns = ["name","age"]
   }
-
+}
 ```
 
 For orc file format simple config
 
 ```bash
+# Set the basic configuration of the task to be performed
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
+
+# Create a source to product data
+source {
+  FakeSource {
+    schema = {
+      fields {
+        name = string
+        age = int
+      }
+    }
+  }
+}
 
+# Write data to Oss
+sink {
   OssFile {
     path="/seatunnel/sink"
     bucket = "oss://tyrantlucifer-image-bed"
@@ -250,27 +212,10 @@ For orc file format simple config
     endpoint = "oss-cn-beijing.aliyuncs.com"
     file_format_type = "orc"
   }
-
+}
 ```
 
-## Changelog
-
-### 2.2.0-beta 2022-09-26
-
-- Add OSS Sink Connector
-
-### 2.3.0-beta 2022-10-20
-
-- [BugFix] Fix the bug of incorrect path in windows environment 
([2980](https://github.com/apache/seatunnel/pull/2980))
-- [BugFix] Fix filesystem get error 
([3117](https://github.com/apache/seatunnel/pull/3117))
-- [BugFix] Solved the bug of can not parse '\t' as delimiter from config file 
([3083](https://github.com/apache/seatunnel/pull/3083))
-
-### Next version
+### Tips
 
-- [BugFix] Fixed the following bugs that failed to write data to files 
([3258](https://github.com/apache/seatunnel/pull/3258))
-  - When field from upstream is null it will throw NullPointerException
-  - Sink columns mapping failed
-  - When restore writer from states getting transaction directly failed
-- [Improve] Support setting batch size for every file 
([3625](https://github.com/apache/seatunnel/pull/3625))
-- [Improve] Support file compress 
([3899](https://github.com/apache/seatunnel/pull/3899))
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).
 
diff --git a/docs/en/connector-v2/source/OssFile.md 
b/docs/en/connector-v2/source/OssFile.md
index 2f51024b67..87e7e0180f 100644
--- a/docs/en/connector-v2/source/OssFile.md
+++ b/docs/en/connector-v2/source/OssFile.md
@@ -31,6 +31,15 @@ Read all the data in a split in a pollNext call. What splits 
are read will be sa
 
 Read data from aliyun oss file system.
 
+## Supported DataSource Info
+
+In order to use the OssFile connector, the following dependencies are required.
+They can be downloaded via install-plugin.sh or from the Maven central 
repository.
+
+| Datasource | Supported Versions |                                       
Dependency                                       |
+|------------|--------------------|----------------------------------------------------------------------------------------|
+| OssFile    | universal          | 
[Download](https://mvnrepository.com/artifact/org.apache.seatunnel/connector-file-oss)
 |
+
 :::tip
 
 If you use spark/flink, In order to use this connector, You must ensure your 
spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.
@@ -42,32 +51,50 @@ It only supports hadoop version **2.9.X+**.
 
 :::
 
-## Options
-
-|           name            |  type   | required |    default value    |
-|---------------------------|---------|----------|---------------------|
-| path                      | string  | yes      | -                   |
-| file_format_type          | string  | yes      | -                   |
-| bucket                    | string  | yes      | -                   |
-| access_key                | string  | yes      | -                   |
-| access_secret             | string  | yes      | -                   |
-| endpoint                  | string  | yes      | -                   |
-| read_columns              | list    | yes      | -                   |
-| delimiter/field_delimiter | string  | no       | \001                |
-| parse_partition_from_path | boolean | no       | true                |
-| skip_header_row_number    | long    | no       | 0                   |
-| date_format               | string  | no       | yyyy-MM-dd          |
-| datetime_format           | string  | no       | yyyy-MM-dd HH:mm:ss |
-| time_format               | string  | no       | HH:mm:ss            |
-| schema                    | config  | no       | -                   |
-| sheet_name                | string  | no       | -                   |
-| file_filter_pattern       | string  | no       | -                   |
-| compress_codec            | string  | no       | none                |
-| common-options            |         | no       | -                   |
-
-### path [string]
-
-The source file path.
+## Data Type Mapping
+
+The File does not have a specific type list, and we can indicate which 
SeaTunenl data type the corresponding data needs to be converted to by 
specifying the Schema in the config.
+
+| SeaTunnel Data type |
+|---------------------|
+| STRING              |
+| SHORT               |
+| INT                 |
+| BIGINT              |
+| BOOLEAN             |
+| DOUBLE              |
+| DECIMAL             |
+| FLOAT               |
+| DATE                |
+| TIME                |
+| TIMESTAMP           |
+| BYTES               |
+| ARRAY               |
+| MAP                 |
+
+## Source Options
+
+|           Name            |  Type   | Required |    default value    |       
                                                                                
                                                                                
             Description                                                        
                                                                                
                                             |
+|---------------------------|---------|----------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| path                      | String  | Yes      | -                   | The 
source file path.                                                               
                                                                                
                                                                                
                                                                                
                                               |
+| file_format_type          | String  | Yes      | -                   | 
Please check #file_format_type below                                            
                                                                                
                                                                                
                                                                                
                                                   |
+| bucket                    | String  | Yes      | -                   | The 
bucket address of oss file system, for example: `oss://tyrantlucifer-image-bed` 
                                                                                
                                                                                
                                                                                
                                               |
+| endpoint                  | String  | Yes      | -                   | The 
endpoint of oss file system.                                                    
                                                                                
                                                                                
                                                                                
                                               |
+| read_columns              | List    | No       | -                   | The 
read column list of the data source, user can use it to implement field 
projection. <br/> The file type supported column projection as the following 
shown: <br/> - text <br/> - json <br/> - csv <br/> - orc <br/> - parquet <br/> 
- excel <br/> **Tips: If the user wants to use this feature when reading `text` 
`json` `csv` files, the schema option must be configured** |
+| access_key                | String  | No       | -                   | The 
access key of oss file system.                                                  
                                                                                
                                                                                
                                                                                
                                               |
+| access_secret             | String  | No       | -                   | The 
access secret of oss file system.                                               
                                                                                
                                                                                
                                                                                
                                               |
+| file_filter_pattern       | String  | No       | -                   | 
Filter pattern, which used for filtering files.                                 
                                                                                
                                                                                
                                                                                
                                                   |
+| delimiter/field_delimiter | String  | No       | \001                | 
**delimiter** parameter will deprecate after version 2.3.5, please use 
**field_delimiter** instead. <br/> Field delimiter, used to tell connector how 
to slice and dice fields when reading text files. <br/> Default `\001`, the 
same as hive's default delimiter                                                
                                                                 |
+| parse_partition_from_path | Boolean | No       | true                | 
Control whether parse the partition keys and values from file path <br/> For 
example if you read a file from path 
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26` <br/> 
Every record data from file will be added these two fields: <br/>      name     
  age  <br/> tyrantlucifer  26   <br/> Tips: **Do not define partition fields 
in schema option**    |
+| date_format               | String  | No       | yyyy-MM-dd          | Date 
type format, used to tell connector how to convert string to date, supported as 
the following formats: <br/> `yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd` <br/> 
default `yyyy-MM-dd`                                                            
                                                                                
                                                    |
+| datetime_format           | String  | No       | yyyy-MM-dd HH:mm:ss | 
Datetime type format, used to tell connector how to convert string to datetime, 
supported as the following formats: <br/> `yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd 
HH:mm:ss` `yyyy/MM/dd HH:mm:ss` `yyyyMMddHHmmss` <br/> default `yyyy-MM-dd 
HH:mm:ss`                                                                       
                                                            |
+| time_format               | String  | No       | HH:mm:ss            | Time 
type format, used to tell connector how to convert string to time, supported as 
the following formats: <br/> `HH:mm:ss` `HH:mm:ss.SSS` <br/> default `HH:mm:ss` 
                                                                                
                                                                                
                                              |
+| skip_header_row_number    | Long    | No       | 0                   | Skip 
the first few lines, but only for the txt and csv. <br/> For example, set like 
following: <br/> `skip_header_row_number = 2` <br/> then SeaTunnel will skip 
the first 2 lines from source files                                             
                                                                                
                                                  |
+| sheet_name                | String  | No       | -                   | 
Reader the sheet of the workbook,Only used when file_format is excel.           
                                                                                
                                                                                
                                                                                
                                                   |
+| schema                    | Config  | No       | -                   | 
Please check #schema below                                                      
                                                                                
                                                                                
                                                                                
                                                   |
+| file_filter_pattern       | string  | no       | -                   | 
Filter pattern, which used for filtering files.                                 
                                                                                
                                                                                
                                                                                
                                                   |
+| compress_codec            | string  | no       | none                | The 
compress codec of files and the details that supported as the following shown: 
<br/> - txt: `lzo` `none` <br/> - json: `lzo` `none` <br/> - csv: `lzo` `none` 
<br/> - orc/parquet: automatically recognizes the compression type, no 
additional settings required.                                                   
                                                          |
+| common-options            |         | No       | -                   | 
Source plugin common parameters, please refer to [Source Common 
Options](common-options.md) for details.                                        
                                                                                
                                                                                
                                                                   |
 
 ### file_format_type [string]
 
@@ -157,84 +184,6 @@ connector will generate data as the following:
 |---------------|-----|--------|
 | tyrantlucifer | 26  | male   |
 
-### bucket [string]
-
-The bucket address of oss file system, for example: 
`oss://tyrantlucifer-image-bed`
-
-### access_key [string]
-
-The access key of oss file system.
-
-### access_secret [string]
-
-The access secret of oss file system.
-
-### endpoint [string]
-
-The endpoint of oss file system.
-
-### read_columns [list]
-
-The read column list of the data source, user can use it to implement field 
projection.
-
-### delimiter/field_delimiter [string]
-
-**delimiter** parameter will deprecate after version 2.3.5, please use 
**field_delimiter** instead.
-
-Only need to be configured when file_format is text.
-
-Field delimiter, used to tell connector how to slice and dice fields.
-
-default `\001`, the same as hive's default delimiter
-
-### parse_partition_from_path [boolean]
-
-Control whether parse the partition keys and values from file path
-
-For example if you read a file from path 
`oss://hadoop-cluster/tmp/seatunnel/parquet/name=tyrantlucifer/age=26`
-
-Every record data from file will be added these two fields:
-
-|     name      | age |
-|---------------|-----|
-| tyrantlucifer | 26  |
-
-Tips: **Do not define partition fields in schema option**
-
-### skip_header_row_number [long]
-
-Skip the first few lines, but only for the txt and csv.
-
-For example, set like following:
-
-`skip_header_row_number = 2`
-
-then SeaTunnel will skip the first 2 lines from source files
-
-### date_format [string]
-
-Date type format, used to tell connector how to convert string to date, 
supported as the following formats:
-
-`yyyy-MM-dd` `yyyy.MM.dd` `yyyy/MM/dd`
-
-default `yyyy-MM-dd`
-
-### datetime_format [string]
-
-Datetime type format, used to tell connector how to convert string to 
datetime, supported as the following formats:
-
-`yyyy-MM-dd HH:mm:ss` `yyyy.MM.dd HH:mm:ss` `yyyy/MM/dd HH:mm:ss` 
`yyyyMMddHHmmss`
-
-default `yyyy-MM-dd HH:mm:ss`
-
-### time_format [string]
-
-Time type format, used to tell connector how to convert string to time, 
supported as the following formats:
-
-`HH:mm:ss` `HH:mm:ss.SSS`
-
-default `HH:mm:ss`
-
 ### schema [config]
 
 Only need to be configured when the file_format_type are text, json, excel or 
csv ( Or other format we can't read the schema from metadata).
@@ -243,34 +192,19 @@ Only need to be configured when the file_format_type are 
text, json, excel or cs
 
 The schema of upstream data.
 
-### sheet_name [string]
-
-Only need to be configured when file_format is excel.
-
-Reader the sheet of the workbook.
-
-### file_filter_pattern [string]
-
-Filter pattern, which used for filtering files.
-
-### compress_codec [string]
-
-The compress codec of files and the details that supported as the following 
shown:
-
-- txt: `lzo` `none`
-- json: `lzo` `none`
-- csv: `lzo` `none`
-- orc/parquet:  
-  automatically recognizes the compression type, no additional settings 
required.
+## How to Create a Oss Data Synchronization Jobs
 
-### common options
+The following example demonstrates how to create a data synchronization job 
that reads data from Oss and prints it on the local client:
 
-Source plugin common parameters, please refer to [Source Common 
Options](common-options.md) for details.
-
-## Example
-
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
 
+# Create a source to connect to Oss
+source {
   OssFile {
     path = "/seatunnel/orc"
     bucket = "oss://tyrantlucifer-image-bed"
@@ -279,11 +213,24 @@ Source plugin common parameters, please refer to [Source 
Common Options](common-
     endpoint = "oss-cn-beijing.aliyuncs.com"
     file_format_type = "orc"
   }
+}
 
+# Console printing of the read Oss data
+sink {
+  Console {
+  }
+}
 ```
 
-```hocon
+```bash
+# Set the basic configuration of the task to be performed
+env {
+  execution.parallelism = 1
+  job.mode = "BATCH"
+}
 
+# Create a source to connect to Oss
+source {
   OssFile {
     path = "/seatunnel/json"
     bucket = "oss://tyrantlucifer-image-bed"
@@ -298,18 +245,16 @@ Source plugin common parameters, please refer to [Source 
Common Options](common-
       }
     }
   }
+}
 
+# Console printing of the read Oss data
+sink {
+  Console {
+  }
+}
 ```
 
-## Changelog
-
-### 2.2.0-beta 2022-09-26
-
-- Add OSS File Source Connector
-
-### 2.3.0-beta 2022-10-20
+### Tips
 
-- [BugFix] Fix the bug of incorrect path in windows environment 
([2980](https://github.com/apache/seatunnel/pull/2980))
-- [Improve] Support extract partition from SeaTunnelRow fields 
([3085](https://github.com/apache/seatunnel/pull/3085))
-- [Improve] Support parse field from file path 
([2985](https://github.com/apache/seatunnel/pull/2985))
+> 1.[SeaTunnel Deployment Document](../../start-v2/locally/deployment.md).

(seatunnel) branch dev updated: [Docs][Connector-V2][Oss]Reconstruct the OssFile connector document (#5233)

Reply via email to