Hi, Dawid. Thanks for your response. I believe I've identified a key point, but I’m a bit unclear about the
following you said. Could you please provide an example for clarification? ``` The only missing information is if the external sink can consume deletes by key and if a source produces full deletes or deletes by key. ``` From my understanding, for a sink, if its schema includes a primary key, we can assume it has the ability to process delete messages (with '-D') and perform deletions by key (PK). If it does not include a PK, we would implicitly treat it as a log-structured table that supports full row deletions. Given that you mentioned `PARTIAL_DELETE`, should I interpret this as referring to a scenario similar to wide tables, where if the sink has a PK, some columns are deleted (set to null or through other operations) while others remain unchanged? Looking forward your reply. -- Best! Xuyang At 2025-02-28 19:16:12, "Dawid Wysakowicz" <wysakowicz.da...@gmail.com> wrote: >Hey Xuyang, >Ad. 1 >Yes, you're right, but we already do that for determining if we need >UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals with >that. >Ad. 2 >Unfortunately it is. This is also the only reason I need a FLIP. We can >determine internally for every internal operator if we can work with >partial deletes or if we need full deletes. The only missing information is >if the external sink can consume deletes by key and if a source produces >full deletes or deletes by key. Unfortunately this is information that >comes from a connector implementation and thus needs to be provided via a >public API. >Ad. 3 >With ChangelogMode#kinds -> to some degree yes. We theoretically could >split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE. >However, that change would 1) be much more involved 2) we would need to >encode that information in every single message, which I think is not >necessary. I don't think it has much to do with PK. >Ad.4 >I don't think so. PK information is part of Schema not about the kind of >messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and >they also apply per key. If the name containing `DELETE_BY_KEY` is >confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd add >`supportsPartialDeletes` > >Best, >Dawid > >On Fri, 28 Feb 2025 at 04:43, Xuyang <xyzhong...@163.com> wrote: > >> Hi Dawid. >> >> >> >> >> Big +1 for this FLIP. After reading through it, I have a few questions and >> would appreciate your responses: >> >> 1. IIUC, we only need to provide additional information in the >> `FlinkChangelogModeInferenceProgram` to enable the >> >> inference program to determine whether it is safe to remove >> `ChangelogNormalize`. My first instinct is that we need to >> >> know if all subsequent output-side nodes consuming Upsert Keys include the >> Upsert Keys provided by the input-side operator (source). >> >> If this condition is met, we can safely eliminate `ChangelogNormalize`. >> Perhaps, I have missed some important points, so please feel >> >> free to correct me if necessary. >> >> 2. The introduction of `supportsDeleteByKey` in ChangelogMode seems to >> exist solely as auxiliary information for the >> >> `FlinkChangelogModeInferenceProgram`. If that's the case, it doesn't seem >> necessary to expose it in the public API, does it? >> >> 3. If the purpose of introducing `supportsDeleteByKey` in ChangelogMode is >> to facilitate support for `#fromChangelogStream` >> >> and `#toChangelogStream`, it appears that `supportsDeleteByKey` might >> overlap with ChangelogMode#kinds and Schema#PK >> >> to some extent, right? >> >> 4. Regarding supportsDeleteByKey, as part of a complete ChangelogMode >> entity, should we also store the specific key information? >> >> >> >> >> >> >> >> -- >> >> Best! >> Xuyang >> >> >> >> >> >> 在 2025-02-28 04:27:19,"Martijn Visser" <martijnvis...@apache.org> 写道: >> >Hi Dawid, >> > >> >Thanks for the FLIP, looks like a good improvement for me that will bring >> a >> >lot of benefits. +1 >> > >> >Best regards, >> > >> >Martijn >> > >> >On Tue, Feb 25, 2025 at 6:51 AM Sergey Nuyanzin <snuyan...@gmail.com> >> wrote: >> > >> >> +1 for such improvement >> >> >> >> On Mon, Feb 24, 2025 at 12:01 PM Dawid Wysakowicz >> >> <wysakowicz.da...@gmail.com> wrote: >> >> > >> >> > Hi everyone, >> >> > >> >> > I would like to initiate a discussion for the FLIP-510[1] below, which >> >> aims >> >> > on optimising certain use cases in SQL which at the moment add >> >> > ChangelogNormalize, but don't necessarily need to do it. >> >> > >> >> > Looking forward to hearing from you. >> >> > >> >> > [1] https://cwiki.apache.org/confluence/x/7o5EF >> >> >> >> >> >> >> >> -- >> >> Best regards, >> >> Sergey >> >> >>