Hi Devs,

I recently worked with the Open Lineage community on this topic. I feel the
open lineage community gives a very good idea to define an intermediate
representation (Dataset) about the metadata of a since/sink. Also
LineageVertex could definitely have multiple datasets, for example Hybrid
source users who read from Kafka first then switch to iceberg. Given this,
I feel the config should be in the dataset rather than LineageVertex. On
the other hand, we want to make the column lineage possible, so having the
query in the dataset will be the reason for the lineage provider to analyze
the column relationship. For input/output schema, we may put it into a
facet. It could be optional depending on the connector implementation.
Thus, I would propose to adjust two interfaces defined in the FLIP-314 as
below:

public interface LineageVertex {

    /* List of input (for source) or output (for sink) datasets interacted
with by the connector */

    List<Dataset> datasets;

}

public interface Dataset {

    /* Name for this particular dataset. */

    String name;

    /* Unique name for this dataset's datasource. */

    String namespace;

    /* Query used to generate the dataset If there is */

    String query;

    /* Facets for the lineage vertex to describe the particular information
of the dataset. */

    Map<FacetType, Facet> facets;

}

Some discussions have been recorded in the jira
https://issues.apache.org/jira/browse/FLINK-31275. Please provide your
thoughts in the thread.


Best Regards

Peter Huang






On Thu, Sep 28, 2023 at 3:56 PM Yuepeng Pan <panyuep...@apache.org> wrote:

> +1(non-binding)
>
> Best,
> Yuepeng Pan
>
> 在 2023-09-28 17:44:46,"Rui Fan" <1996fan...@gmail.com> 写道:
> >+1(binding)
> >
> >Best,
> >Rui
> >
> >On Thu, 28 Sep 2023 at 14:41, Chen Zhanghao <zhanghao.c...@outlook.com>
> >wrote:
> >
> >> +1 (non-binding), thanks for driving this.
> >>
> >> Best,
> >> Zhanghao Chen
> >> ________________________________
> >> 发件人: Shammon FY <zjur...@gmail.com>
> >> 发送时间: 2023年9月25日 13:28
> >> 收件人: dev <dev@flink.apache.org>
> >> 主题: [VOTE] FLIP-314: Support Customized Job Lineage Listener
> >>
> >> Hi devs,
> >>
> >> Thanks for all the feedback on FLIP-314: Support Customized Job Lineage
> >> Listener [1] in thread [2].
> >>
> >> I would like to start a vote for it. The vote will be opened for at
> least
> >> 72 hours unless there is an objection or insufficient votes.
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> >> [2] https://lists.apache.org/thread/wopprvp3ww243mtw23nj59p57cghh7mc
> >>
> >> Best,
> >> Shammon FY
> >>
>

Reply via email to