[jira] [Commented] (FLINK-31275) Flink supports reporting and storage of source/sink tables relationship

Fang Yong (Jira) Mon, 06 Nov 2023 01:16:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783147#comment-17783147
 ]


Fang Yong commented on FLINK-31275:
-----------------------------------

Hi [~mobuchowski] Thanks for your reply.

I think our ideas are consistent, just at different levels of abstraction. The 
interface `LineageVertex` is the top interface for connectors in Flink, and we 
implement `TableLineageVertex` for tables, because a Table is a complete 
definition, including the database, schema, etc. We put the options in the 
`with` into a map, which is consistent with the definition and usage habits of 
SQL in Flink.

For the official Flink connectors, we will implement the `LineageVertex` for 
`Source` and `InputFormat` for `DataStream` jobs, such as `KafkaSourceLineage`, 
etc, as we mentioned in FLINK: `We will implement LineageVertexProvider  for 
the builtin source and sink such as KafkaSource , HiveSource , 
FlinkKafkaProducerBase  and etc.`.
End-users don't need to implement them. In order to be consistent with the 
usage habits of tables, we will put the corresponding information into a map 
when implementing it, and users can obtain it.

So, I think our current point of divergence is which level of abstraction the 
user needs to perceive. In the current FLIP, for DataStream jobs, listener 
developers need to identify whether the `LineageVertex` is a 
`KafkaSourceLineageVertex` or a `JdbcLineageVertex`. You mean we need to define 
another layer, such as the `DataSetConfig` interface, and then the listener 
developer can identify whether it is a `KafkaDataSetConfig` or a 
`JdbcDataSetConfig`, right?

Our current use of `LineageVertexis` mainly to consider flexibility and 
facilitate the addition of returned information in the lineage vertex of the 
`DataStream`, such as the vector type data source information mentioned in the 
FLIP example. At the same time, connector maintainers can also easily provide 
lineage vertex for customized connectors. If the connector is in table format, 
we prefer that users directly provide a TableLineageVertex instance.





> Flink supports reporting and storage of source/sink tables relationship
> -----------------------------------------------------------------------
>
>                 Key: FLINK-31275
>                 URL: https://issues.apache.org/jira/browse/FLINK-31275
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner
>    Affects Versions: 1.18.0
>            Reporter: Fang Yong
>            Assignee: Fang Yong
>            Priority: Major
>
> FLIP-314 has been accepted 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31275) Flink supports reporting and storage of source/sink tables relationship

Reply via email to