SML0127 commented on code in PR #3924:
URL: https://github.com/apache/flink-cdc/pull/3924#discussion_r2018080972
##########
docs/content/docs/connectors/flink-sources/mysql-cdc.md:
##########
@@ -254,6 +254,17 @@ Connector Options
<td>Integer</td>
<td>The maximum fetch size for per poll when read table
snapshot.</td>
</tr>
+ <tr>
+ <td>scan.incremental.snapshot.chunk.key-column</td>
+ <td>optional</td>
+ <td style="word-wrap: break-word;">(none)</td>
+ <td>String</td>
+ <td>The chunk key of table snapshot, captured tables are split into
multiple chunks by a chunk key when read the snapshot of table.
+ By default, the chunk key is the first column of the primary key.
A column that is not part of the primary key can be used as a chunk key, but
this may lead to slower query performance.
+ <br>
+ <b>Warning:</b> Using a non-primary key column as a chunk key may
lead to data inconsistencies. Please see <a href="#warning">Warning</a> for
details.
+ </td>
+ </tr>
Review Comment:
I added `scan.incremental.snapshot.chunk.key-column` option with `warning`
link, because it was not in the Connector Options table of mysql cdc docs.
##########
docs/content.zh/docs/connectors/flink-sources/mysql-cdc.md:
##########
@@ -787,6 +798,31 @@ $ ./bin/flink run \
* 如果指定的列不存在更新操作,此时可以保证 Exactly once 语义。
* 如果指定的列存在更新操作,此时只能保证 At least once 语义。但可以结合下游,通过指定下游主键,结合幂等性操作来保证数据的正确性。
+#### 警告
+
+在 MySQL 表中,若使用 **非主键列** 作为
`scan.incremental.snapshot.chunk.key-column`,可能导致**数据不一致**。以下为可能出现的问题及其缓解方案。
+
+#### 问题场景
+
+- **表结构:**
+ - **主键:** `id`
+ - **分片键列 :** `pid`(非主键)
+
+- **快照分片 :**
+ - **分片 0:** `1 < pid <= 3`
+ - **分片 1:** `3 < pid <= 5`
+
+- **操作 :**
+ - 两个子任务并行读取 **分片 0** 和 **分片 1**。
+ - 在读取过程中,发生了一次 **更新** 操作,使 `id=0` 的 `pid` 从 `2` 变为 `4`,而此时两个分片的**高低水位**均包含
`pid=3`,导致该更新操作跨越两个分片。
+
+- **结果 :**
+ - **分片 0:** 记录 `[id=0, pid=2]`
+ - **分片 1:** 记录 `[id=0, pid=4]`
+
+由于**处理顺序**无法保证,最终 `id=0` 的 `pid` 可能为 `2` 或 `4`,从而导致数据不一致。
+
+
Review Comment:
Please feel free to refine any awkward expressions or unnatural wording in
this content.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]