yzeng1618 opened a new pull request, #9213:
URL: https://github.com/apache/seatunnel/pull/9213
…ssions in PostgreSQL/mysql/oracle/sqlserver, and fix the null pointer issue
in the regular expression
<!--
Thank you for contributing to SeaTunnel! Please make sure that your code
changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
## Contribution Checklist
- Make sure that the pull request corresponds to a [GITHUB
issue](https://github.com/apache/seatunnel/issues).
- Name the pull request in the form "[Feature] [component] Title of the
pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
- Minor fixes should be named following this pattern: `[hotfix] [docs] Fix
typo in README.md doc`.
-->
https://github.com/apache/seatunnel/issues/9209
### Purpose of this pull request
<!-- Describe the purpose of this pull request. For example: This pull
request adds checkstyle plugin.-->
added contents about multiple tables and regular expressions in
PostgreSQL/mysql/oracle/sqlserver, and fix the null pointer issue in the
regular expression.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as
the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes
- provide the console output, description and/or an example to show the
behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to
the released SeaTunnel versions or within the unreleased branches such as dev.
If no, write 'No'.
If you are adding/modifying connector documents, please follow our new
specifications: https://github.com/apache/seatunnel/issues/4544.
-->
Yes.
This PR enhances the table_list parameter to directly support regex patterns
for table filtering, while maintaining backward compatibility.
Here's the detailed breakdown:
- New Feature: Direct Use of Regular Expressions in table_path
Purpose: Allow users to write regular expressions directly in the table_path
field within table_list to filter tables.
Example Configuration:
"table_list"=[
{
"table_path"="TEST.TEST_DB_*" # Matches all tables with the
"TEST_DB" prefix
}
]
This configuration matches all tables prefixed with TEST_DB (e.g.,
TEST_DB_2023, TEST_DB_2024).
- Improvement and Fix:
Enhanced the robustness of the approximateRowCntStatement method in
OracleDialect.
Fixed a null pointer error in Oracle when executing queries with empty or
invalid parameters.
Key Notes:
The table_path now supports regex syntax (e.g., TEST.TEST_DB_* → matches all
tables under the TEST schema with names starting with TEST_DB_).
The Oracle fix ensures stable execution of row count estimation logic,
avoiding crashes due to unhandled edge cases.
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some
test cases that check the changes thoroughly including negative and positive
cases if possible.
If it was tested in a way different from regular unit tests, please clarify
how you tested step by step, ideally copy and paste-able, so that other
reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why
it was difficult to add.
If you are adding E2E test cases, maybe refer to
https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mysql-e2e/src/test/resources/mysqlcdc_to_mysql.conf,
here is a good example.
-->
- Testing Environment
OS: Linux (Ubuntu 20.04)
SeaTunnel Version: 2.3.9
Execution Mode: Flink on YARN (yarn-application)
Databases:
PostgreSQL 16.0(Source)
Iceberg (Sink)
- Test Configuration:
Created a configuration file pg2iceberg.conf to read tables matching the
regex TEST.TEST_DB_* from PostgreSQL:
{
env {
execution.parallelism = 1
job.mode = "BATCH"
job.name = "seatunnel_batch_job"
}
# source配置
source {
JDBC {
url = "jdbc:postgresql://xxxxxxx:xxxxx/xxxxx"
driver = "org.postgresql.Driver"
user = "xxxxxxxx"
password = "xxxxxxx"
"table_list" = [
{
"table_path" = "postgres.public.test_db_2.*"
}
]
split.size = 5000
fetch_size = 2000
}
}
# sink配置
sink {
Iceberg {
........
}
}
}
- Execution Command:
./bin/start-seatunnel-flink-15-connector-v2.sh --config pg2iceberg.conf
--deploy-mode run-application --target yarn-application --name
multitable_pg2Iceberg
- Check log:
Filtering tables with regex pattern: postgres.public.test_db_2.*
Found regex match table: postgres.public.test_db_20
Found regex match table: postgres.public.test_db_21
Found regex match table: postgres.public.test_db_22
Found regex match table: postgres.public.test_db_20250324
Found regex match table: postgres.public.test_db_202502
Found regex match table: postgres.public.test_db_202501
Total tables matched after filtering: 6
### Check list
* [x] If any new Jar binary package adding in your PR, please add License
Notice according
[New License
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
* [x] If necessary, please update the documentation to describe the new
feature. https://github.com/apache/seatunnel/tree/dev/docs
* [x] If you are contributing the connector code, please check that the
following files are updated:
1. Update
[plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties)
and add new connector information in it
2. Update the pom file of
[seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml)
3. Add ci label in
[label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml)
4. Add e2e testcase in
[seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/)
5. Update connector
[plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config)
* [x] Update the
[`release-note`](https://github.com/apache/seatunnel/blob/dev/release-note.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]