[
https://issues.apache.org/jira/browse/FLINK-23426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17383706#comment-17383706
]
Jark Wu commented on FLINK-23426:
---------------------------------
This was planned in FLINK-18825. I prefer to have a materialize operator to
materialize changelog stream into insert-only stream as the source is bounded.
Regarding "no missing UPDATE_AFTER", I think all the CDC formats and connectors
we supported have "complete" CDC logs, e.g. debezium, canal, maxwell,
mysql-cdc, UPDATE_BEFORE and UPDATE_AFTER are always contained in a single
UPDATE event.
> Support changelog processing in batch mode
> ------------------------------------------
>
> Key: FLINK-23426
> URL: https://issues.apache.org/jira/browse/FLINK-23426
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API
> Reporter: Timo Walther
> Priority: Major
>
> The DataStream API can execute arbitrary DataStream programs when running in
> batch mode. However, this is not the case for the Table API batch mode. E.g.
> a source with non-insert only changes is not supported and updates/deletes
> cannot be emitted.
> In theory, we could make this work by running the "stream mode" of the
> planner (CDC transformations) on top of the "batch mode" of DataStream API
> (specialized state backend, sorted inputs). It is up for discussion if and
> how we expose such functionality.
> If we don't allow enabling incremental updates, we can also add a special
> batch operator that materializes the incoming changes for a batch pipeline.
> However, it would require "complete" CDC logs (i.e. no missing UPDATE_AFTER).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)