ruanhang1993 commented on PR #3928:
URL: https://github.com/apache/flink-cdc/pull/3928#issuecomment-2756842730

   > @ruanhang1993 Thanks for your kind explanation. I understand the 
situation. As you suggested, I will add warings and some examples to the docs.
   > 
   > I have a minor question, if we set 
`scan.incremental.snapshot.backfill.skip` to `true` and using pk as chunk key 
in the incremental phase, will there be no issue? (I means that, first we use a 
column not in primary key as chunk key in snapshot phase. And then use primary 
key as chunk key in incremental phase)
   
   @SML0127 I am not sure whether I understand your question right.
   
   If we set `scan.incremental.snapshot.backfill.skip` to `true`, the source 
will have the at-least-once semantics. The update (id =0 , pid from 2 to 4) 
will be sent in incremental phase. 
   - If the sink could deduplicate based on the primary key, the final result 
will be right(For example, Starrocks). 
   - If the sink could not deduplicate based on the primary key, the final 
result could be wrong. 
     - For example, Kafka may contain the change history as 
+I(id=0,pid=4),+I(id=0,pid=2),-U(id=0,pid=2),+U(id=0,pid=4). I don't think we 
should treat it as a right result.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to