Jaskaran Singh Kukreja created FLINK-39765:
----------------------------------------------
Summary: Add FLIP-314 lineage support (LineageVertexProvider) to
MySQL CDC source connector
Key: FLINK-39765
URL: https://issues.apache.org/jira/browse/FLINK-39765
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Reporter: Jaskaran Singh Kukreja
*Summary*
Implement Flink's {{LineageVertexProvider}} interface (FLIP-314) on
{{MySqlSource}} to enable native lineage reporting.
*Motivation*
Flink 1.20 introduced FLIP-314 which provides a native lineage API via
{{{}LineageVertexProvider{}}}. OpenLineage's Flink integration uses this to
extract source/sink datasets and emit lineage events. Currently, none of the
CDC source connectors implement {{{}LineageVertexProvider{}}}, so OpenLineage
events emitted from CDC pipelines contain empty inputs.
*Proposed Changes*
Implement {{LineageVertexProvider}} on {{MySqlSource}} to report captured
tables as input datasets with their schemas. The scope of this ticket targets
MySQL only, but shared lineage utilities will be added to {{flink-cdc-common}}
(e.g., {{{}LineageUtils{}}}, {{{}CdcSourceLineageVertex{}}},
{{{}CdcLineageDataset{}}}) so that other source connectors (Postgres, Oracle,
etc.) can easily adopt lineage support in follow-up PRs.
*Result*
**
Each captured MySQL table is reported as an input dataset using Flink native
APIs with openlineage standard:
* {*}Namespace{*}: {{mysql://hostname:port}}
* {*}Name{*}: resolved table name (e.g., {{{}mydb.users{}}})
* {*}Config facet{*}: source type ({{{}mysql-cdc{}}})
* {*}Schema facet{*}: column names and MySQL types
--
This message was sent by Atlassian Jira
(v8.20.10#820010)