[
https://issues.apache.org/jira/browse/FLINK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786866#comment-17786866
]
Maciej Obuchowski edited comment on FLINK-31275 at 11/16/23 5:22 PM:
---------------------------------------------------------------------
[~zjureel] I think this is a step in the right direction.
For `StreamingFacet`, if the intention is to attach it to connectors
interfacing with services like Kafka, Kinesis, Pulsar, PubSub then I believe it
needs ability to describe multiple topics - since, at least for Kafka, you can
read from either list of topics or wildcard pattern. The same for `SchemaFacet`
- the topics can have different schema.
The way that OpenLineage deals with this - and Flink could do too - is
associating those with concept of `Dataset`, rather than job itself. So, we'd
have roughly
public interface LineageVertex
public interface LineageVertex {
/* List of input (for source) or output (for sink) datasets interacted with
by the connector */
List<Dataset> datasets; /* Facets for the lineage vertex to describe the
general information of source/sink. */ Map<String, Facet> facets; /*
Config for the lineage vertex contains all the options for the connector. */
Map<String, String> config;
}
{code:java}
public interface Dataset {
/* Name for this particular dataset. */
String name;
/* Unique name for this dataset's datasource. */
String namespace;
/* Facets for the lineage vertex to describe the particular information of
dataset. */
Map<String, Facet> facets;
} {code}
was (Author: mobuchowski):
[~zjureel] I think this is a step in the right direction.
For `StreamingFacet`, if the intention is to attach it to connectors
interfacing with services like Kafka, Kinesis, Pulsar, PubSub then I believe it
needs ability to describe multiple topics - since, at least for Kafka, you can
read from either list of topics or wildcard pattern. The same for `SchemaFacet`
- the topics can have different schema.
The way that OpenLineage deals with this - and Flink could do too - is
associating those with concept of `Dataset`, rather than job itself. So, we'd
have roughly
public interface LineageVertex {
/* List of input (for source) or output (for sink) datasets interacted with
by the connector */
List<Dataset> datasets; /* Facets for the lineage vertex to describe the
general information of source/sink. */ Map<String, Facet> facets; /*
Config for the lineage vertex contains all the options for the connector. */
Map<String, String> config;
}
public interface Dataset {
/* Name for this particular dataset. */
String name;
/* Unique name for this dataset's datasource. */
String namespace;
/* Facets for the lineage vertex to describe the particular information of
dataset. */ Map<String, Facet> facets;
}
> Flink supports reporting and storage of source/sink tables relationship
> -----------------------------------------------------------------------
>
> Key: FLINK-31275
> URL: https://issues.apache.org/jira/browse/FLINK-31275
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Planner
> Affects Versions: 1.18.0
> Reporter: Fang Yong
> Assignee: Fang Yong
> Priority: Major
>
> FLIP-314 has been accepted
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
--
This message was sent by Atlassian Jira
(v8.20.10#820010)