Ruiii-w opened a new pull request, #10467:
URL: https://github.com/apache/seatunnel/pull/10467
<!--
Thank you for contributing to SeaTunnel! Please make sure that your code
changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
## Contribution Checklist
- Make sure that the pull request corresponds to a
`https://github.com/apache/seatunnel/issues` .
- Name the pull request in the form "[Feature] [component] Title of the
pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
- Minor fixes should be named following this pattern: `[hotfix] [docs] Fix
typo in README.md doc`.
-->
### Purpose of this pull request
<!-- Describe the purpose of this pull request. For example: This pull
request adds checkstyle plugin.-->
This pull request introduces support for the PostgreSQL `COPY` protocol in
the JDBC Source connector.
The `COPY` command is significantly faster than standard `SELECT` queries
for bulk data retrieval. This feature allows users to enable `COPY` mode for
PostgreSQL sources to improve read performance.
**Note:** This feature was developed based on SeaTunnel version 2.3.8, but
has been merged into dev and the functionality remains normal.
Key features added:
- Support for `COPY (SELECT ...) TO STDOUT` statement generation.
- Support for partitioned reads (sharding) within `COPY` statements.
- Support for both CSV and Binary formats in `COPY`.
- New configuration options: `use_copy_statement`, `binary`, and
`pg_copy_buffer_size`.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as
the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes
- provide the console output, description and/or an example to show the
behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to
the released SeaTunnel versions or within the unreleased branches such as dev.
If no, write 'No'.
If you are adding/modifying connector documents, please follow our new
specifications: https://github.com/apache/seatunnel/issues/4544.
-->
Yes, this PR adds new configuration options for JDBC Source (specifically
for PostgreSQL).
**New Options:**
| Name | Type | Required | Default Value | Description
|
| --------------------- | ------- | -------- | ------------- |
------------------------------------------------------------ |
| `use_copy_statement` | Boolean | No | `false` | Whether to
use `COPY` method for reading. |
| `binary` | Boolean | No | `false` | Whether to
use binary format for `COPY` reading. Only takes effect when
`use_copy_statement=true`. |
| `pg_copy_buffer_size` | Int | No | `1048576` | Buffer size
for `COPY` reading (bytes). Only takes effect when `use_copy_statement=true`. |
**Documentation Details (per new specifications):**
1. **Data Source Title**: JDBC PostgreSQL Source (Copy Mode)
2. **Connector Support Version**: SeaTunnel 2.3.8+
3. **Data Source Description**: Supports reading data from PostgreSQL using
the `COPY` protocol for high-performance bulk data extraction.
4. **Supported Engines**: Spark, Flink, SeaTunnel Zeta
5. **Supported Data Source List**: PostgreSQL
6. **Dependencies**: `org.postgresql:postgresql` (Standard JDBC Driver)
7. **Data Type Mapping**: Fully compatible with existing JDBC PostgreSQL
type mappings.
8. **Options**: See the "New Options" table above.
**Example Configuration:**
```hocon
source {
Jdbc {
url = "jdbc:postgresql://localhost:5432/test"
driver = "org.postgresql.Driver"
user = "postgres"
password = "password"
query = "select * from my_table"
# Enable COPY mode
use_copy_statement = true
pg_copy_buffer_size = 1048576
}
}
```
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some
test cases that check the changes thoroughly including negative and positive
cases if possible.
If it was tested in a way different from regular unit tests, please clarify
how you tested step by step, ideally copy and paste-able, so that other
reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why
it was difficult to add.
If you are adding E2E test cases, maybe refer to
https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mysql-e2e/src/test/resources/mysqlcdc_to_mysql.conf,
here is a good example.
-->
- Verified locally using
`org.apache.seatunnel.example.engine.SeaTunnelEngineLocalExample`.
- Configured the local example to use the PostgreSQL JDBC source with
`use_copy_statement = true` and verified that data was correctly extracted and
processed.
- Validated that the `COPY` SQL was generated correctly and executed against
a local PostgreSQL instance.
### Check list
* [x] If any new Jar binary package adding in your PR, please add License
Notice according
`https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md`
* [x] If necessary, please update the documentation to describe the new
feature. https://github.com/apache/seatunnel/tree/dev/docs
* [ ] If necessary, please update `incompatible-changes.md` to describe the
incompatibility caused by this PR.
* [x] If you are contributing the connector code, please check that the
following files are updated:
1. Update
`https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties` and
add new connector information in it
2. Update the pom file of
`https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml`
3. Add ci label in
`https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml`
4. Add e2e testcase in
`https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/`
5. Update connector
`https://github.com/apache/seatunnel/blob/dev/config/plugin_config`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]