(seatunnel) branch dev updated: [Improve][Doc] Add `file_filter_pattern` example to doc (#7922)

wanghailin Tue, 29 Oct 2024 05:25:39 -0700

This is an automated email from the ASF dual-hosted git repository.

wanghailin pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new a2590e8ee4 [Improve][Doc] Add `file_filter_pattern` example to doc 
(#7922)
a2590e8ee4 is described below

commit a2590e8ee4855cda351d06d2d69bd534d565a57f
Author: YOMO LEE <[email protected]>
AuthorDate: Tue Oct 29 20:24:04 2024 +0800

    [Improve][Doc] Add `file_filter_pattern` example to doc (#7922)
---
 docs/en/connector-v2/source/CosFile.md      | 80 ++++++++++++++++++++++++++-
 docs/en/connector-v2/source/FtpFile.md      | 80 +++++++++++++++++++++++++++
 docs/en/connector-v2/source/HdfsFile.md     | 79 ++++++++++++++++++++++++++-
 docs/en/connector-v2/source/LocalFile.md    | 77 +++++++++++++++++++++++++-
 docs/en/connector-v2/source/OssFile.md      | 80 ++++++++++++++++++++++++++-
 docs/en/connector-v2/source/OssJindoFile.md | 80 ++++++++++++++++++++++++++-
 docs/en/connector-v2/source/S3File.md       | 83 ++++++++++++++++++++++++++++-
 docs/en/connector-v2/source/SftpFile.md     | 82 +++++++++++++++++++++++++++-
 docs/zh/connector-v2/source/HdfsFile.md     | 79 ++++++++++++++++++++++++++-
 9 files changed, 708 insertions(+), 12 deletions(-)

diff --git a/docs/en/connector-v2/source/CosFile.md 
b/docs/en/connector-v2/source/CosFile.md
index 702439c306..15b6de0c6f 100644
--- a/docs/en/connector-v2/source/CosFile.md
+++ b/docs/en/connector-v2/source/CosFile.md
@@ -45,7 +45,7 @@ To use this connector you need put 
hadoop-cos-{hadoop.version}-{version}.jar and
 
 ## Options
 
-|           name            |  type   | required |    default value    |
+| name                      | type    | required | default value       |
 |---------------------------|---------|----------|---------------------|
 | path                      | string  | yes      | -                   |
 | file_format_type          | string  | yes      | -                   |
@@ -64,7 +64,7 @@ To use this connector you need put 
hadoop-cos-{hadoop.version}-{version}.jar and
 | sheet_name                | string  | no       | -                   |
 | xml_row_tag               | string  | no       | -                   |
 | xml_use_attr_format       | boolean | no       | -                   |
-| file_filter_pattern       | string  | no       | -                   |
+| file_filter_pattern       | string  | no       |                     |
 | compress_codec            | string  | no       | none                |
 | archive_compress_codec    | string  | no       | none                |
 | encoding                  | string  | no       | UTF-8               |
@@ -275,6 +275,55 @@ Specifies Whether to process data using the tag attribute 
format.
 
 Filter pattern, which used for filtering files.
 
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following 
shown:
@@ -372,6 +421,33 @@ sink {
 
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  CosFile {
+    bucket = "cosn://seatunnel-test-1259587829"
+    secret_id = "xxxxxxxxxxxxxxxxxxx"
+    secret_key = "xxxxxxxxxxxxxxxxxxx"
+    region = "ap-chengdu"
+    path = "/seatunnel/read/binary/"
+    file_format_type = "binary"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### next version
diff --git a/docs/en/connector-v2/source/FtpFile.md 
b/docs/en/connector-v2/source/FtpFile.md
index ec02f77f9f..6d11481376 100644
--- a/docs/en/connector-v2/source/FtpFile.md
+++ b/docs/en/connector-v2/source/FtpFile.md
@@ -84,6 +84,59 @@ The target ftp password is required
 
 The source file path.
 
+### file_filter_pattern [string]
+
+Filter pattern, which used for filtering files.
+
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### file_format_type [string]
 
 File type, supported as the following file types:
@@ -400,6 +453,33 @@ sink {
 
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  FtpFile {
+    host = "192.168.31.48"
+    port = 21
+    user = tyrantlucifer
+    password = tianchao
+    path = "/seatunnel/read/binary/"
+    file_format_type = "binary"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### 2.2.0-beta 2022-09-26
diff --git a/docs/en/connector-v2/source/HdfsFile.md 
b/docs/en/connector-v2/source/HdfsFile.md
index 7413c0428b..405dfff820 100644
--- a/docs/en/connector-v2/source/HdfsFile.md
+++ b/docs/en/connector-v2/source/HdfsFile.md
@@ -41,7 +41,7 @@ Read data from hdfs file system.
 
 ## Source Options
 
-|           Name            |  Type   | Required |       Default       |       
                                                                                
                                                                           
Description                                                                     
                                                                                
             |
+| Name                      | Type    | Required | Default             | 
Description                                                                     
                                                                                
                                                                                
                                                                                
              |
 
|---------------------------|---------|----------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | path                      | string  | yes      | -                   | The 
source file path.                                                               
                                                                                
                                                                                
                                                                                
          |
 | file_format_type          | string  | yes      | -                   | We 
supported as the following file types:`text` `csv` `parquet` `orc` `json` 
`excel` `xml` `binary`.Please note that, The final file name will end with the 
file_format's suffix, the suffix of the text file is `txt`.                     
                                                                                
                  |
@@ -62,6 +62,7 @@ Read data from hdfs file system.
 | sheet_name                | string  | no       | -                   | 
Reader the sheet of the workbook,Only used when file_format is excel.           
                                                                                
                                                                                
                                                                                
              |
 | xml_row_tag               | string  | no       | -                   | 
Specifies the tag name of the data rows within the XML file, only used when 
file_format is xml.                                                             
                                                                                
                                                                                
                  |
 | xml_use_attr_format       | boolean | no       | -                   | 
Specifies whether to process data using the tag attribute format, only used 
when file_format is xml.                                                        
                                                                                
                                                                                
                  |
+| file_filter_pattern       | string  | no       |                     | 
Filter pattern, which used for filtering files.                                 
                                                                                
                                                                                
                                                                                
              |
 | compress_codec            | string  | no       | none                | The 
compress codec of files                                                         
                                                                                
                                                                                
                                                                                
          |
 | archive_compress_codec    | string  | no       | none                |
 | encoding                  | string  | no       | UTF-8               |       
                                                                                
                                                                                
                                                                                
                                                                                
        |
@@ -71,6 +72,59 @@ Read data from hdfs file system.
 
 **delimiter** parameter will deprecate after version 2.3.5, please use 
**field_delimiter** instead.
 
+### file_filter_pattern [string]
+
+Filter pattern, which used for filtering files.
+
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following 
shown:
@@ -146,3 +200,26 @@ sink {
 }
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  HdfsFile {
+    path = "/apps/hive/demo/student"
+    file_format_type = "json"
+    fs.defaultFS = "hdfs://namenode001"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
diff --git a/docs/en/connector-v2/source/LocalFile.md 
b/docs/en/connector-v2/source/LocalFile.md
index 6d11b992e3..65f287f057 100644
--- a/docs/en/connector-v2/source/LocalFile.md
+++ b/docs/en/connector-v2/source/LocalFile.md
@@ -43,7 +43,7 @@ If you use SeaTunnel Engine, It automatically integrated the 
hadoop jar when you
 
 ## Options
 
-|           name            |  type   | required |            default value    
         |
+| name                      | type    | required | default value               
         |
 
|---------------------------|---------|----------|--------------------------------------|
 | path                      | string  | yes      | -                           
         |
 | file_format_type          | string  | yes      | -                           
         |
@@ -58,7 +58,7 @@ If you use SeaTunnel Engine, It automatically integrated the 
hadoop jar when you
 | sheet_name                | string  | no       | -                           
         |
 | xml_row_tag               | string  | no       | -                           
         |
 | xml_use_attr_format       | boolean | no       | -                           
         |
-| file_filter_pattern       | string  | no       | -                           
         |
+| file_filter_pattern       | string  | no       |                             
         |
 | compress_codec            | string  | no       | none                        
         |
 | archive_compress_codec    | string  | no       | none                        
         |
 | encoding                  | string  | no       | UTF-8                       
         |
@@ -254,6 +254,55 @@ Specifies Whether to process data using the tag attribute 
format.
 
 Filter pattern, which used for filtering files.
 
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following 
shown:
@@ -406,6 +455,30 @@ sink {
 
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  LocalFile {
+    path = "/data/seatunnel/"
+    file_format_type = "csv"
+    skip_header_row_number = 1
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### 2.2.0-beta 2022-09-26
diff --git a/docs/en/connector-v2/source/OssFile.md 
b/docs/en/connector-v2/source/OssFile.md
index d5326cb86a..36d998f054 100644
--- a/docs/en/connector-v2/source/OssFile.md
+++ b/docs/en/connector-v2/source/OssFile.md
@@ -190,7 +190,7 @@ If you assign file type to `parquet` `orc`, schema option 
not required, connecto
 
 ## Options
 
-|           name            |  type   | required |    default value    |       
                                                                                
                                                                      
Description                                                                     
                                                                                
        |
+| name                      | type    | required | default value       | 
Description                                                                     
                                                                                
                                                                                
                                                                                
    |
 
|---------------------------|---------|----------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | path                      | string  | yes      | -                   | The 
Oss path that needs to be read can have sub paths, but the sub paths need to 
meet certain format requirements. Specific requirements can be referred to 
"parse_partition_from_path" option                                              
                                                                                
        |
 | file_format_type          | string  | yes      | -                   | File 
type, supported as the following file types: `text` `csv` `parquet` `orc` 
`json` `excel` `xml` `binary`                                                   
                                                                                
                                                                                
     |
@@ -211,7 +211,7 @@ If you assign file type to `parquet` `orc`, schema option 
not required, connecto
 | xml_use_attr_format       | boolean | no       | -                   | 
Specifies whether to process data using the tag attribute format, only used 
when file_format is xml.                                                        
                                                                                
                                                                                
        |
 | compress_codec            | string  | no       | none                | Which 
compress codec the files used.                                                  
                                                                                
                                                                                
                                                                              |
 | encoding                  | string  | no       | UTF-8               |
-| file_filter_pattern       | string  | no       |                     | 
`*.txt` means you only need read the files end with `.txt`                      
                                                                                
                                                                                
                                                                                
    |
+| file_filter_pattern       | string  | no       |                     | 
Filter pattern, which used for filtering files.                                 
                                                                                
                                                                                
                                                                                
    |
 | common-options            | config  | no       | -                   | 
Source plugin common parameters, please refer to [Source Common 
Options](../source-common-options.md) for details.                              
                                                                                
                                                                                
                    |
 
 ### compress_codec [string]
@@ -233,6 +233,55 @@ The encoding of the file to read. This param will be 
parsed by `Charset.forName(
 
 Filter pattern, which used for filtering files.
 
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### schema [config]
 
 Only need to be configured when the file_format_type are text, json, excel, 
xml or csv ( Or other format we can't read the schema from metadata).
@@ -474,6 +523,33 @@ sink {
 }
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  OssFile {
+    path = "/seatunnel/orc"
+    bucket = "oss://tyrantlucifer-image-bed"
+    access_key = "xxxxxxxxxxxxxxxxx"
+    access_secret = "xxxxxxxxxxxxxxxxxxxxxx"
+    endpoint = "oss-cn-beijing.aliyuncs.com"
+    file_format_type = "orc"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### 2.2.0-beta 2022-09-26
diff --git a/docs/en/connector-v2/source/OssJindoFile.md 
b/docs/en/connector-v2/source/OssJindoFile.md
index d5bd6d14fa..933439edc9 100644
--- a/docs/en/connector-v2/source/OssJindoFile.md
+++ b/docs/en/connector-v2/source/OssJindoFile.md
@@ -49,7 +49,7 @@ It only supports hadoop version **2.9.X+**.
 
 ## Options
 
-|           name            |  type   | required |    default value    |
+| name                      | type    | required | default value       |
 |---------------------------|---------|----------|---------------------|
 | path                      | string  | yes      | -                   |
 | file_format_type          | string  | yes      | -                   |
@@ -68,7 +68,7 @@ It only supports hadoop version **2.9.X+**.
 | sheet_name                | string  | no       | -                   |
 | xml_row_tag               | string  | no       | -                   |
 | xml_use_attr_format       | boolean | no       | -                   |
-| file_filter_pattern       | string  | no       | -                   |
+| file_filter_pattern       | string  | no       |                     |
 | compress_codec            | string  | no       | none                |
 | archive_compress_codec    | string  | no       | none                |
 | encoding                  | string  | no       | UTF-8               |
@@ -267,6 +267,55 @@ Reader the sheet of the workbook.
 
 Filter pattern, which used for filtering files.
 
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following 
shown:
@@ -364,6 +413,33 @@ sink {
 
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  OssJindoFile {
+    bucket = "oss://tyrantlucifer-image-bed"
+    access_key = "xxxxxxxxxxxxxxxxx"
+    access_secret = "xxxxxxxxxxxxxxxxxxxxxx"
+    endpoint = "oss-cn-beijing.aliyuncs.com"
+    path = "/seatunnel/read/binary/"
+    file_format_type = "binary"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### next version
diff --git a/docs/en/connector-v2/source/S3File.md 
b/docs/en/connector-v2/source/S3File.md
index d280d6dc7f..4834b025bc 100644
--- a/docs/en/connector-v2/source/S3File.md
+++ b/docs/en/connector-v2/source/S3File.md
@@ -196,7 +196,7 @@ If you assign file type to `parquet` `orc`, schema option 
not required, connecto
 
 ## Options
 
-|              name               |  type   | required |                     
default value                     | Description                                 
                                                                                
                                                                                
                                                                                
                                                                                
                [...]
+| name                            | type    | required | default value         
                                | Description                                   
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 
|---------------------------------|---------|----------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
 | path                            | string  | yes      | -                     
                                | The s3 path that needs to be read can have 
sub paths, but the sub paths need to meet certain format requirements. Specific 
requirements can be referred to "parse_partition_from_path" option              
                                                                                
                                                                                
                 [...]
 | file_format_type                | string  | yes      | -                     
                                | File type, supported as the following file 
types: `text` `csv` `parquet` `orc` `json` `excel` `xml` `binary`               
                                                                                
                                                                                
                                                                                
                 [...]
@@ -220,12 +220,66 @@ If you assign file type to `parquet` `orc`, schema option 
not required, connecto
 | compress_codec                  | string  | no       | none                  
                                |                                               
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 | archive_compress_codec          | string  | no       | none                  
                                |                                               
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
 | encoding                        | string  | no       | UTF-8                 
                                |                                               
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| file_filter_pattern             | string  | no       |                       
                                | Filter pattern, which used for filtering 
files.                                                                          
                                                                                
                                                                                
                                                                                
                   [...]
 | common-options                  |         | no       | -                     
                                | Source plugin common parameters, please refer 
to [Source Common Options](../source-common-options.md) for details.            
                                                                                
                                                                                
                                                                                
              [...]
 
 ### delimiter/field_delimiter [string]
 
 **delimiter** parameter will deprecate after version 2.3.5, please use 
**field_delimiter** instead.
 
+### file_filter_pattern [string]
+
+Filter pattern, which used for filtering files.
+
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 The compress codec of files and the details that supported as the following 
shown:
@@ -349,6 +403,33 @@ sink {
 }
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  S3File {
+    path = "/seatunnel/json"
+    bucket = "s3a://seatunnel-test"
+    fs.s3a.endpoint="s3.cn-north-1.amazonaws.com.cn"
+    
fs.s3a.aws.credentials.provider="com.amazonaws.auth.InstanceProfileCredentialsProvider"
+    file_format_type = "json"
+    read_columns = ["id", "name"]
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
+
 ## Changelog
 
 ### 2.3.0-beta 2022-10-20
diff --git a/docs/en/connector-v2/source/SftpFile.md 
b/docs/en/connector-v2/source/SftpFile.md
index 3eadcd3a69..95c710110a 100644
--- a/docs/en/connector-v2/source/SftpFile.md
+++ b/docs/en/connector-v2/source/SftpFile.md
@@ -71,7 +71,7 @@ The File does not have a specific type list, and we can 
indicate which SeaTunnel
 
 ## Source Options
 
-|           Name            |  Type   | Required |    default value    |       
                                                                                
                                                                                
            Description                                                         
                                                                                
                                          |
+|           Name            |  Type   | Required |    default value    | 
Description                                                                     
                                                                                
                                                                                
                                                                                
                                                |
 
|---------------------------|---------|----------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | host                      | String  | Yes      | -                   | The 
target sftp host is required                                                    
                                                                                
                                                                                
                                                                                
                                            |
 | port                      | Int     | Yes      | -                   | The 
target sftp port is required                                                    
                                                                                
                                                                                
                                                                                
                                            |
@@ -96,6 +96,59 @@ The File does not have a specific type list, and we can 
indicate which SeaTunnel
 | encoding                  | string  | no       | UTF-8               |
 | common-options            |         | No       | -                   | 
Source plugin common parameters, please refer to [Source Common 
Options](../source-common-options.md) for details.                              
                                                                                
                                                                                
                                                                |
 
+### file_filter_pattern [string]
+
+Filter pattern, which used for filtering files.
+
+The pattern follows standard regular expressions. For details, please refer to 
https://en.wikipedia.org/wiki/Regular_expression.
+There are some examples.
+
+File Structure Example:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+Matching Rules Example:
+
+**Example 1**: *Match all .txt files*，Regular Expression:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241001/report.txt
+```
+**Example 2**: *Match all file starting with abc*，Regular Expression:
+```
+/data/seatunnel/20241002/abc.*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**Example 3**: *Match all file starting with abc，And the fourth character is 
either h or g*, the Regular Expression:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**Example 4**: *Match third level folders starting with 202410 and files 
ending with .csv*, the Regular Expression:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+The result of this example matching is:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### file_format_type [string]
 
 File type, supported as the following file types:
@@ -305,3 +358,30 @@ SftpFile {
 
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  SftpFile {
+    host = "sftp"
+    port = 22
+    user = seatunnel
+    password = pass
+    path = "tmp/seatunnel/read/json"
+    file_format_type = "json"
+    result_table_name = "sftp"
+    // file example abcD2024.csv
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
\ No newline at end of file
diff --git a/docs/zh/connector-v2/source/HdfsFile.md 
b/docs/zh/connector-v2/source/HdfsFile.md
index 0f983a80bc..9cd254ef80 100644
--- a/docs/zh/connector-v2/source/HdfsFile.md
+++ b/docs/zh/connector-v2/source/HdfsFile.md
@@ -39,7 +39,7 @@
 
 ## 源选项
 
-|            名称             |   类型    | 是否必须 |      默认值       |                
                                                                                
                     描述                                                         
                                                             |
+| 名称                        | 类型      | 是否必须 | 默认值            | 描述             
                                                                                
                                                                                
                                                             |
 
|---------------------------|---------|------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | path                      | string  | 是    | -              | 源文件路径。         
                                                                                
                                                                                
                                                             |
 | file_format_type          | string  | 是    | -              | 
我们支持以下文件类型：`text` `json` `csv` `orc` `parquet` 
`excel`。请注意，最终文件名将以文件格式的后缀结束，文本文件的后缀是 `txt`。                                    
                                                                                
                             |
@@ -55,6 +55,7 @@
 | kerberos_principal        | string  | 否    | -              | kerberos 的 
principal。                                                                      
                                                                                
                                                                 |
 | kerberos_keytab_path      | string  | 否    | -              | kerberos 的 
keytab 路径。                                                                      
                                                                                
                                                                 |
 | skip_header_row_number    | long    | 否    | 0              | 跳过前几行，但仅适用于 
txt 和 csv。例如，设置如下：`skip_header_row_number = 2`。然后 Seatunnel 将跳过源文件中的前两行。        
                                                                                
                                                                |
+| file_filter_pattern       | string  | 否    | -              | 过滤模式，用于过滤文件。   
                                                                                
                                                                                
                                                             |
 | schema                    | config  | 否    | -              | 上游数据的模式字段。     
                                                                                
                                                                                
                                                             |
 | sheet_name                | string  | 否    | -              | 
读取工作簿的表格，仅在文件格式为 excel 时使用。                                                     
                                                                                
                                                                            |
 | compress_codec            | string  | 否    | none           | 文件的压缩编解码器。     
                                                                                
                                                                                
                                                             |
@@ -64,6 +65,60 @@
 
 **delimiter** 参数在版本 2.3.5 后将被弃用，请改用 **field_delimiter**。
 
+### file_filter_pattern [string]
+
+过滤模式，用于过滤文件。
+
+这个过滤规则遵循正则表达式. 关于详情，请参考 https://en.wikipedia.org/wiki/Regular_expression 学习
+
+这里是一些例子.
+
+文件清单:
+```
+/data/seatunnel/20241001/report.txt
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+/data/seatunnel/20241012/logo.png
+```
+匹配规则:
+
+**例子 1**: *匹配所有txt为后缀名的文件*，匹配正则为:
+```
+/data/seatunnel/20241001/.*\.txt
+```
+匹配的结果是:
+```
+/data/seatunnel/20241001/report.txt
+```
+**例子 2**: *匹配所有文件名以abc开头的文件*，匹配正则为:
+```
+/data/seatunnel/20241002/abc.*
+```
+匹配的结果是:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+```
+**例子 3**: *匹配所有文件名以abc开头，并且文件第四个字母是 h 或者 g 的文件*, 匹配正则为:
+```
+/data/seatunnel/20241007/abc[h,g].*
+```
+匹配的结果是:
+```
+/data/seatunnel/20241007/abch202410.csv
+```
+**例子 4**: *匹配所有文件夹第三级以 202410 开头并且文件后缀名是.csv的文件*, 匹配正则为:
+```
+/data/seatunnel/202410\d*/.*\.csv
+```
+匹配的结果是:
+```
+/data/seatunnel/20241007/abch202410.csv
+/data/seatunnel/20241002/abcg202410.csv
+/data/seatunnel/20241005/old_data.csv
+```
+
 ### compress_codec [string]
 
 文件的压缩编解码器及支持的详细信息如下所示：
@@ -125,3 +180,25 @@ sink {
 }
 ```
 
+### Filter File
+
+```hocon
+env {
+  parallelism = 1
+  job.mode = "BATCH"
+}
+
+source {
+  HdfsFile {
+    path = "/apps/hive/demo/student"
+    file_format_type = "json"
+    fs.defaultFS = "hdfs://namenode001"
+    file_filter_pattern = "abc[DX]*.*"
+  }
+}
+
+sink {
+  Console {
+  }
+}
+```
\ No newline at end of file

(seatunnel) branch dev updated: [Improve][Doc] Add `file_filter_pattern` example to doc (#7922)

Reply via email to