[
https://issues.apache.org/jira/browse/FLINK-36683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Runkang He updated FLINK-36683:
-------------------------------
Description:
'row_kind' metadata is very useful in actual user scenarios, the two main
scenarios are below:
1. Save all upstream messages: In this scenario, the downstream will save all
message includes delete messages from upstream. To achieve this requirement, we
should convert full changelogs to append only message, and need to use metadata
row_kind to represent the changelog kind.
2. Ignore upstream delete messages: In this scenario, to save storage space,
the upstream cdc source often deletes historical data regularly and only
retains data within seven days. However, the business requires the downstream
OLAP system to retain the full amount of historical data, so it is necessary to
ignore the delete messages from source.
So I think we should support 'row_kind' metadata in Mongo CDC Connector.
was:
'row_kind' metadata is very useful in actual user scenarios, the two main
scenarios are below:
1. Save all upstream messages: In this scenario, the downstream will save all
message includes delete messages from upstream. To achieve this requirement, we
should convert full changelogs to append only message, and need to use metadata
row_kind to represent the changelog kind.
2. Ignore upstream delete messages: In this scenario, the upstream cdc source
often deletes historical data regularly to save storage space and only retains
data within seven days. However, the business requires the downstream OLAP
system to retain the full amount of historical data, so it is necessary to
ignore the delete messages from source.
So I think we should support 'row_kind' metadata in Mongo CDC Connector.
> Support metadata 'row_kind' virtual column for Mongo CDC Connector
> ------------------------------------------------------------------
>
> Key: FLINK-36683
> URL: https://issues.apache.org/jira/browse/FLINK-36683
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.3.0, cdc-3.2.1
> Reporter: Runkang He
> Priority: Major
>
> 'row_kind' metadata is very useful in actual user scenarios, the two main
> scenarios are below:
> 1. Save all upstream messages: In this scenario, the downstream will save all
> message includes delete messages from upstream. To achieve this requirement,
> we should convert full changelogs to append only message, and need to use
> metadata row_kind to represent the changelog kind.
> 2. Ignore upstream delete messages: In this scenario, to save storage space,
> the upstream cdc source often deletes historical data regularly and only
> retains data within seven days. However, the business requires the downstream
> OLAP system to retain the full amount of historical data, so it is necessary
> to ignore the delete messages from source.
> So I think we should support 'row_kind' metadata in Mongo CDC Connector.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)