[GitHub] [inlong] e-mhui opened a new pull request, #7416: [INLONG-7410][Sort] Support open incremental snapshot in oracle cdc connector

via GitHub Wed, 22 Feb 2023 23:42:16 -0800


e-mhui opened a new pull request, #7416:
URL: https://github.com/apache/inlong/pull/7416


   Support open incremental snapshot in oracle cdc connector
   
   - Fixes #7410
   
   ### Motivation
   
   Support open incremental snapshot in oracle cdc connector and Improve the 
efficiency of reading records.
   
   ### Modifications
   
   add some options
   
   ```java
   enableParallelRead
   splitSize
   splitMetaGroupSize
   fetchSize
   connectTimeout
   connectMaxRetries
   connectionPoolSize
   distributionFactorUpper
   distributionFactorLower
   chunkKeyColumn
   ```
   
   ### Verifying this change
   
   run `OracleExtractSqlParseTest.java`
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes)
   
   new options desciption:
   
   |Option | Required | Default | Type | Description
   -- | -- | -- | -- | --
   scan.incremental.snapshot.enabled | optional | true | Boolean | Incremental 
snapshot is a new mechanism to read snapshot of a table. Compared to the old 
snapshot mechanism, the incremental snapshot has many advantages, including: 
(1) source can be parallel during snapshot reading, (2) source can perform 
checkpoints in the chunk granularity during snapshot reading, (3) source 
doesn't need to acquire ROW SHARE MODE lock before snapshot reading.
   scan.incremental.snapshot.chunk.size | optional | 8096 | Integer | The chunk 
size (number of rows) of table snapshot, captured tables are split into 
multiple chunks when read the snapshot of table.
   scan.snapshot.fetch.size | optional | 1024 | Integer | The maximum fetch 
size for per poll when read table snapshot.
   connect.max-retries | optional | 3 | Integer | The max retry times that the 
connector should retry to build MySQL database server connection.
   chunk-meta.group.size | optional | 1000 | Integer | The group size of chunk 
meta, if the meta size exceeds the group size, the meta will be divided into 
multiple groups.
   connect.timeout | optional | 30s | Duration | The maximum time that the 
connector should wait after trying to connect to the Oracle database server 
before timing out.
   chunk-key.even-distribution.factor.lower-bound | optional | 0.05d | Double | 
The lower bound of chunk key distribution factor. The distribution factor is 
used to determine whether the table is evenly distribution or not. The table 
chunks would use evenly calculation optimization when the data distribution is 
even, and the query for splitting would happen when it is uneven. The 
distribution factor could be calculated by (MAX(id) - MIN(id) + 1) / rowCount.
   chunk-key.even-distribution.factor.upper-bound | optional | 1000.0d | Double 
| The upper bound of chunk key distribution factor. The distribution factor is 
used to determine whether the table is evenly distribution or not. The table 
chunks would use evenly calculation optimization when the data distribution is 
even, and the query for splitting would happen when it is uneven. The 
distribution factor could be calculated by (MAX(id) - MIN(id) + 1) / rowCount.
   connection.pool.size | optional | 20 | Integer | The connection pool size.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [inlong] e-mhui opened a new pull request, #7416: [INLONG-7410][Sort] Support open incremental snapshot in oracle cdc connector

Reply via email to