Jaskaran Singh Kukreja created FLINK-39765:
----------------------------------------------

             Summary: Add FLIP-314 lineage support (LineageVertexProvider) to 
MySQL CDC source connector
                 Key: FLINK-39765
                 URL: https://issues.apache.org/jira/browse/FLINK-39765
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
            Reporter: Jaskaran Singh Kukreja


*Summary*

Implement Flink's {{LineageVertexProvider}} interface (FLIP-314) on 
{{MySqlSource}} to enable native lineage reporting.



*Motivation*

Flink 1.20 introduced FLIP-314 which provides a native lineage API via 
{{{}LineageVertexProvider{}}}. OpenLineage's Flink integration uses this to 
extract source/sink datasets and emit lineage events. Currently, none of the 
CDC source connectors implement {{{}LineageVertexProvider{}}}, so OpenLineage 
events emitted from CDC pipelines contain empty inputs.



*Proposed Changes*

Implement {{LineageVertexProvider}} on {{MySqlSource}} to report captured 
tables as input datasets with their schemas. The scope of this ticket targets 
MySQL only, but shared lineage utilities will be added to {{flink-cdc-common}} 
(e.g., {{{}LineageUtils{}}}, {{{}CdcSourceLineageVertex{}}}, 
{{{}CdcLineageDataset{}}}) so that other source connectors (Postgres, Oracle, 
etc.) can easily adopt lineage support in follow-up PRs.

*Result*
**

Each captured MySQL table is reported as an input dataset using Flink native 
APIs with openlineage standard:
 * {*}Namespace{*}: {{mysql://hostname:port}}
 * {*}Name{*}: resolved table name (e.g., {{{}mydb.users{}}})
 * {*}Config facet{*}: source type ({{{}mysql-cdc{}}})
 * {*}Schema facet{*}: column names and MySQL types

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to