[ 
https://issues.apache.org/jira/browse/FLINK-39765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaskaran Singh Kukreja updated FLINK-39765:
-------------------------------------------
    Description: 
*Summary*

Implement Flink's {{LineageVertexProvider}} interface (FLIP-314) on 
{{MySqlSource}} to enable native lineage reporting.

*Motivation*

Flink 1.20 introduced FLIP-314 which provides a native lineage API via 
{{{}LineageVertexProvider{}}}. OpenLineage's Flink integration uses this to 
extract source/sink datasets and emit lineage events. Currently, none of the 
CDC source connectors implement {{{}LineageVertexProvider{}}}, so OpenLineage 
events emitted from CDC pipelines contain empty inputs.

*Proposed Changes*

Implement {{LineageVertexProvider}} on {{MySqlSource}} to report captured 
tables as input datasets with their schemas. The scope of this ticket targets 
MySQL only, but shared lineage utilities will be added to {{flink-cdc-common}} 
(e.g., {{{}LineageUtils{}}}, {{{}CdcSourceLineageVertex{}}}, 
{{{}CdcLineageDataset{}}}) so that other source connectors (Postgres, Oracle, 
etc.) can easily adopt lineage support in follow-up PRs.

*Result*

Each captured MySQL table is reported as an input dataset using Flink native 
APIs with openlineage standard:
 * {*}Namespace{*}: {{mysql://hostname:port}}
 * {*}Name{*}: resolved table name (e.g., {{{}mydb.users{}}})
 * {*}Config facet{*}: source type ({{{}mysql-cdc{}}})
 * {*}Schema facet{*}: column names and MySQL types

 

  was:
*Summary*

Implement Flink's {{LineageVertexProvider}} interface (FLIP-314) on 
{{MySqlSource}} to enable native lineage reporting.



*Motivation*

Flink 1.20 introduced FLIP-314 which provides a native lineage API via 
{{{}LineageVertexProvider{}}}. OpenLineage's Flink integration uses this to 
extract source/sink datasets and emit lineage events. Currently, none of the 
CDC source connectors implement {{{}LineageVertexProvider{}}}, so OpenLineage 
events emitted from CDC pipelines contain empty inputs.



*Proposed Changes*

Implement {{LineageVertexProvider}} on {{MySqlSource}} to report captured 
tables as input datasets with their schemas. The scope of this ticket targets 
MySQL only, but shared lineage utilities will be added to {{flink-cdc-common}} 
(e.g., {{{}LineageUtils{}}}, {{{}CdcSourceLineageVertex{}}}, 
{{{}CdcLineageDataset{}}}) so that other source connectors (Postgres, Oracle, 
etc.) can easily adopt lineage support in follow-up PRs.

*Result*
**

Each captured MySQL table is reported as an input dataset using Flink native 
APIs with openlineage standard:
 * {*}Namespace{*}: {{mysql://hostname:port}}
 * {*}Name{*}: resolved table name (e.g., {{{}mydb.users{}}})
 * {*}Config facet{*}: source type ({{{}mysql-cdc{}}})
 * {*}Schema facet{*}: column names and MySQL types

 


> Add FLIP-314 lineage support (LineageVertexProvider) to MySQL CDC source 
> connector
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-39765
>                 URL: https://issues.apache.org/jira/browse/FLINK-39765
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>            Reporter: Jaskaran Singh Kukreja
>            Priority: Minor
>              Labels: Flink-CDC
>
> *Summary*
> Implement Flink's {{LineageVertexProvider}} interface (FLIP-314) on 
> {{MySqlSource}} to enable native lineage reporting.
> *Motivation*
> Flink 1.20 introduced FLIP-314 which provides a native lineage API via 
> {{{}LineageVertexProvider{}}}. OpenLineage's Flink integration uses this to 
> extract source/sink datasets and emit lineage events. Currently, none of the 
> CDC source connectors implement {{{}LineageVertexProvider{}}}, so OpenLineage 
> events emitted from CDC pipelines contain empty inputs.
> *Proposed Changes*
> Implement {{LineageVertexProvider}} on {{MySqlSource}} to report captured 
> tables as input datasets with their schemas. The scope of this ticket targets 
> MySQL only, but shared lineage utilities will be added to 
> {{flink-cdc-common}} (e.g., {{{}LineageUtils{}}}, 
> {{{}CdcSourceLineageVertex{}}}, {{{}CdcLineageDataset{}}}) so that other 
> source connectors (Postgres, Oracle, etc.) can easily adopt lineage support 
> in follow-up PRs.
> *Result*
> Each captured MySQL table is reported as an input dataset using Flink native 
> APIs with openlineage standard:
>  * {*}Namespace{*}: {{mysql://hostname:port}}
>  * {*}Name{*}: resolved table name (e.g., {{{}mydb.users{}}})
>  * {*}Config facet{*}: source type ({{{}mysql-cdc{}}})
>  * {*}Schema facet{*}: column names and MySQL types
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to