ruanhang1993 commented on PR #3928:
URL: https://github.com/apache/flink-cdc/pull/3928#issuecomment-2756842730
> @ruanhang1993 Thanks for your kind explanation. I understand the
situation. As you suggested, I will add warings and some examples to the docs.
>
> I have a minor question, if we set
`scan.incremental.snapshot.backfill.skip` to `true` and using pk as chunk key
in the incremental phase, will there be no issue? (I means that, first we use a
column not in primary key as chunk key in snapshot phase. And then use primary
key as chunk key in incremental phase)
@SML0127 I am not sure whether I understand your question right.
If we set `scan.incremental.snapshot.backfill.skip` to `true`, the source
will have the at-least-once semantics. The update (id =0 , pid from 2 to 4)
will be sent in incremental phase.
- If the sink could deduplicate based on the primary key, the final result
will be right(For example, Starrocks).
- If the sink could not deduplicate based on the primary key, the final
result could be wrong.
- For example, Kafka may contain the change history as
+I(id=0,pid=4),+I(id=0,pid=2),-U(id=0,pid=2),+U(id=0,pid=4). I don't think we
should treat it as a right result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]