e-mhui opened a new pull request, #7416:
URL: https://github.com/apache/inlong/pull/7416
Support open incremental snapshot in oracle cdc connector
- Fixes #7410
### Motivation
Support open incremental snapshot in oracle cdc connector and Improve the
efficiency of reading records.
### Modifications
add some options
```java
enableParallelRead
splitSize
splitMetaGroupSize
fetchSize
connectTimeout
connectMaxRetries
connectionPoolSize
distributionFactorUpper
distributionFactorLower
chunkKeyColumn
```
### Verifying this change
run `OracleExtractSqlParseTest.java`
### Documentation
- Does this pull request introduce a new feature? (yes)
new options desciption:
|Option | Required | Default | Type | Description
-- | -- | -- | -- | --
scan.incremental.snapshot.enabled | optional | true | Boolean | Incremental
snapshot is a new mechanism to read snapshot of a table. Compared to the old
snapshot mechanism, the incremental snapshot has many advantages, including:
(1) source can be parallel during snapshot reading, (2) source can perform
checkpoints in the chunk granularity during snapshot reading, (3) source
doesn't need to acquire ROW SHARE MODE lock before snapshot reading.
scan.incremental.snapshot.chunk.size | optional | 8096 | Integer | The chunk
size (number of rows) of table snapshot, captured tables are split into
multiple chunks when read the snapshot of table.
scan.snapshot.fetch.size | optional | 1024 | Integer | The maximum fetch
size for per poll when read table snapshot.
connect.max-retries | optional | 3 | Integer | The max retry times that the
connector should retry to build MySQL database server connection.
chunk-meta.group.size | optional | 1000 | Integer | The group size of chunk
meta, if the meta size exceeds the group size, the meta will be divided into
multiple groups.
connect.timeout | optional | 30s | Duration | The maximum time that the
connector should wait after trying to connect to the Oracle database server
before timing out.
chunk-key.even-distribution.factor.lower-bound | optional | 0.05d | Double |
The lower bound of chunk key distribution factor. The distribution factor is
used to determine whether the table is evenly distribution or not. The table
chunks would use evenly calculation optimization when the data distribution is
even, and the query for splitting would happen when it is uneven. The
distribution factor could be calculated by (MAX(id) - MIN(id) + 1) / rowCount.
chunk-key.even-distribution.factor.upper-bound | optional | 1000.0d | Double
| The upper bound of chunk key distribution factor. The distribution factor is
used to determine whether the table is evenly distribution or not. The table
chunks would use evenly calculation optimization when the data distribution is
even, and the query for splitting would happen when it is uneven. The
distribution factor could be calculated by (MAX(id) - MIN(id) + 1) / rowCount.
connection.pool.size | optional | 20 | Integer | The connection pool size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]