Hi, Dawid.

Thanks for your response. I believe I've identified a key point, but I’m a bit 
unclear about the 

following you said. Could you please provide an example for clarification?

```

The only missing information is if the external sink can consume deletes by key 
and if a source

produces full deletes or deletes by key.

```

From my understanding, for a sink, if its schema includes a primary key, we can 
assume it has 

the ability to process delete messages (with '-D') and perform deletions by key 
(PK). If it does not 

include a PK, we would implicitly treat it as a log-structured table that 
supports full row deletions. 

Given that you mentioned `PARTIAL_DELETE`, should I interpret this as referring 
to a scenario 

similar to wide tables, where if the sink has a PK, some columns are deleted 
(set to null or through 

other operations) while others remain unchanged?

Looking forward your reply.







--

    Best!
    Xuyang





At 2025-02-28 19:16:12, "Dawid Wysakowicz" <wysakowicz.da...@gmail.com> wrote:
>Hey Xuyang,
>Ad. 1
>Yes, you're right, but we already do that for determining if we need
>UPDATE_BEFORE or not. FlinkChangelogModeInferenceProgram already deals with
>that.
>Ad. 2
>Unfortunately it is. This is also the only reason I need a FLIP. We can
>determine internally for every internal operator if we can work with
>partial deletes or if we need full deletes. The only missing information is
>if the external sink can consume deletes by key and if a source produces
>full deletes or deletes by key. Unfortunately this is information that
>comes from a connector implementation and thus needs to be provided via a
>public API.
>Ad. 3
>With ChangelogMode#kinds -> to some degree yes. We theoretically could
>split RowKind#DELETE to RowKind#DELETE_BY_KEY and RowKind#FULL_DELETE.
>However, that change would 1) be much more involved 2) we would need to
>encode that information in every single message, which I think is not
>necessary. I don't think it has much to do with PK.
>Ad.4
>I don't think so. PK information is part of Schema not about the kind of
>messages. We don't have PK information for UPDATE_BEFORE/UPDATE_AFTER and
>they also apply per key. If the name containing `DELETE_BY_KEY` is
>confusing I am happy to rename it to e.g. PARTIAL_DELETE, therefore I'd add
>`supportsPartialDeletes`
>
>Best,
>Dawid
>
>On Fri, 28 Feb 2025 at 04:43, Xuyang <xyzhong...@163.com> wrote:
>
>> Hi Dawid.
>>
>>
>>
>>
>> Big +1 for this FLIP. After reading through it, I have a few questions and
>> would appreciate your responses:
>>
>> 1. IIUC, we only need to provide additional information in the
>> `FlinkChangelogModeInferenceProgram` to enable the
>>
>> inference program to determine whether it is safe to remove
>> `ChangelogNormalize`. My first instinct is that we need to
>>
>> know if all subsequent output-side nodes consuming Upsert Keys include the
>> Upsert Keys provided by the input-side operator (source).
>>
>> If this condition is met, we can safely eliminate `ChangelogNormalize`.
>> Perhaps, I have missed some important points, so please feel
>>
>> free to correct me if necessary.
>>
>> 2. The introduction of `supportsDeleteByKey` in ChangelogMode seems to
>> exist solely as auxiliary information for the
>>
>> `FlinkChangelogModeInferenceProgram`. If that's the case, it doesn't seem
>> necessary to expose it in the public API, does it?
>>
>> 3. If the purpose of introducing `supportsDeleteByKey` in ChangelogMode is
>> to facilitate support for `#fromChangelogStream`
>>
>> and `#toChangelogStream`, it appears that `supportsDeleteByKey` might
>> overlap with ChangelogMode#kinds and Schema#PK
>>
>> to some extent, right?
>>
>> 4. Regarding supportsDeleteByKey, as part of a complete ChangelogMode
>> entity, should we also store the specific key information?
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>     Best!
>>     Xuyang
>>
>>
>>
>>
>>
>> 在 2025-02-28 04:27:19,"Martijn Visser" <martijnvis...@apache.org> 写道:
>> >Hi Dawid,
>> >
>> >Thanks for the FLIP, looks like a good improvement for me that will bring
>> a
>> >lot of benefits. +1
>> >
>> >Best regards,
>> >
>> >Martijn
>> >
>> >On Tue, Feb 25, 2025 at 6:51 AM Sergey Nuyanzin <snuyan...@gmail.com>
>> wrote:
>> >
>> >> +1 for such improvement
>> >>
>> >> On Mon, Feb 24, 2025 at 12:01 PM Dawid Wysakowicz
>> >> <wysakowicz.da...@gmail.com> wrote:
>> >> >
>> >> > Hi everyone,
>> >> >
>> >> > I would like to initiate a discussion for the FLIP-510[1] below, which
>> >> aims
>> >> > on optimising certain use cases in SQL which at the moment add
>> >> > ChangelogNormalize, but don't necessarily need to do it.
>> >> >
>> >> > Looking forward to hearing from you.
>> >> >
>> >> > [1] https://cwiki.apache.org/confluence/x/7o5EF
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards,
>> >> Sergey
>> >>
>>

Reply via email to