This is an automated email from the ASF dual-hosted git repository.

fcsaky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/master by this push:
     new dae2cfe4ab7 [FLINK-35745][docs] Add documentation for Flink lineage
dae2cfe4ab7 is described below

commit dae2cfe4ab7d1d222233b3b1c340962cc911af2b
Author: Peter Huang <[email protected]>
AuthorDate: Wed Nov 19 04:20:23 2025 -0800

    [FLINK-35745][docs] Add documentation for Flink lineage
---
 .../deployment/advanced/job_status_listener.md     |  82 +++++++++++++++++++++
 docs/content.zh/docs/internals/data_lineage.md     |  56 ++++++++++++++
 .../deployment/advanced/job_status_listener.md     |   4 +-
 docs/content/docs/internals/data_lineage.md        |  59 +++++++++++++++
 docs/static/fig/lineage_interfaces.png             | Bin 0 -> 129973 bytes
 5 files changed, 200 insertions(+), 1 deletion(-)

diff --git a/docs/content.zh/docs/deployment/advanced/job_status_listener.md 
b/docs/content.zh/docs/deployment/advanced/job_status_listener.md
new file mode 100644
index 00000000000..5015010bee5
--- /dev/null
+++ b/docs/content.zh/docs/deployment/advanced/job_status_listener.md
@@ -0,0 +1,82 @@
+
+---
+title: "作业状态改变监听器"
+nav-title: job-status-listener
+nav-parent_id: advanced
+nav-pos: 5
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## 作业状态改变监听器
+Flink 为用户提供了一个可插入接口,用于注册处理作业状态变化的自定义逻辑,其中提供了有关源/接收器的沿袭信息。这使用户能够实现自己的 Flink 
数据血缘报告器,将沿袭信息发送到第三方数据沿袭系统,例如 Datahub 和 Openlineage。
+
+每次应用程序发生状态更改时,都会触发作业状态更改监听器。数据沿袭信息包含在 JobCreatedEvent 中。
+
+### 为你的自定义丰富器实现插件
+
+要实现自定义 JobStatusChangedListener 插件,您需要:
+
+- 添加自己的 JobStatusChangedListener 通过实现 {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/execution/JobStatusChangedListener.java"
 name="JobStatusChangedListener" >}} 接口。
+
+- 添加自己的 JobStatusChangedListenerFactory 通过实现 {{< gh_link 
file="/flink-core/src/main/java/org/apache/flink/core/execution/JobStatusChangedListenerFactory.java"
 name="JobStatusChangedListenerFactory" >}} 接口。
+
+- 添加Java服务条目。创建文件 
`META-INF/services/org.apache.flink.core.execution.JobStatusChangedListenerFactory`
 其中包含您的作业状态更改侦听器工厂类的类名 (请看 [Java Service 
Loader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/ServiceLoader.html)
 文档了解更多详情)。
+
+
+然后,创建一个包含 `JobStatusChangedListener`, `JobStatusChangedListenerFactory`, 
`META-INF/services/` 以及所有外部依赖项的 Java 库.
+在 Flink 发行版的 `plugins/` 中创建一个目录,使用任意名称,例如“job-status-changed-listener”,并将 jar 
放入此目录中。
+有关更多详细信息,请参阅 [Flink Plugin]({{< ref "docs/deployment/filesystems/plugins" >}})。
+
+JobStatusChangedListenerFactory 示例:
+
+``` java
+package org.apache.flink.test.execution;
+
+public static class TestingJobStatusChangedListenerFactory
+        implements JobStatusChangedListenerFactory {
+
+    @Override
+    public JobStatusChangedListener createListener(Context context) {
+        return new TestingJobStatusChangedListener();
+    }
+}
+```
+
+JobStatusChangedListener 示例:
+
+``` java
+package org.apache.flink.test.execution;
+
+private static class TestingJobStatusChangedListener implements 
JobStatusChangedListener {
+
+    @Override
+    public void onEvent(JobStatusChangedEvent event) {
+        statusChangedEvents.add(event);
+    }
+}
+```
+
+### 配置
+
+Flink 组件在启动时加载 JobStatusChangedListener 插件。为确保加载 JobStatusChangedListener 
的所有实现,所有类名都应定义在 [execution.job-status-changed-listeners]({{< ref 
"docs/deployment/config#execution.job-status-changed-listeners" >}}).
+如果此配置为空,则不会启动任何监听器。例如
+```
+    execution.job-status-changed-listeners = 
org.apache.flink.test.execution.TestingJobStatusChangedListenerFactory
+```
+
+{{< top >}}
diff --git a/docs/content.zh/docs/internals/data_lineage.md 
b/docs/content.zh/docs/internals/data_lineage.md
new file mode 100644
index 00000000000..8725675f287
--- /dev/null
+++ b/docs/content.zh/docs/internals/data_lineage.md
@@ -0,0 +1,56 @@
+---
+title: 数据血缘
+weight: 12
+type: docs
+aliases:
+  - /zh/internals/data_lineage.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 原生血缘支持
+数据血缘在数据生态系统中变得越来越重要。随着 Apache Flink 被广泛用于流数据湖中的数据提取和 
ETL,我们需要一个端到端的沿袭解决方案,用于包括但不限于以下场景:
+  - `数据质量保证`: 通过将数据错误追溯到数据管道内的来源来识别和纠正数据不一致.
+  - `数据治理`: 通过记录数据来源和转换来建立明确的数据所有权和责任制.
+  - `数据合规`: 通过在整个生命周期中跟踪数据流和转换,确保遵守数据隐私和合规性法规.
+  - `数据优化`: 识别冗余的数据处理步骤并优化数据流以提高效率.
+
+Apache Flink 为满足社区需求提供了原生的沿袭支持,它提供了一个内部沿袭数据模型和 [作业状态监听器]({{< ref 
"docs/deployment/advanced/job_status_listener" >}}) 以便开发人员将血缘元数据集成到外部系统中,例如 
[OpenLineage](https://openlineage.io). 
+在 Flink 运行时创建作业时,包含沿袭图元数据的 JobCreatedEvent 将被发送到这个作业状态监听器里.
+
+# 血统数据模型
+Flink 原生的 Lineage 接口分为两层定义,第一层是所有 Flink 作业和 Connector 的通用接口,第二层则单独定义了 Table 和 
DataStream 的扩展接口,接口和类的关系定义如下图所示。
+
+{{< img src="/fig/lineage_interfaces.png" alt="Lineage Data Model" 
width="80%">}}
+
+默认情况下,Table 相关的 lineage 接口或类主要在 Flink Table Runtime 中使用,因此 Flink 
用户不需要接触这些接口。Flink 社区将逐步支持所有
+常见的连接器,例如 Kafka、JDBC、Cassandra、Hive 等。如果您定义了自定义连接器,则需要自定义 source/sink 实现 
LineageVertexProvider 接口。
+在 LineageVertex 中,定义了一个 Lineage Dataset 列表作为 Flink source/sink 的元数据。
+
+
+```java
+@PublicEvolving
+public interface LineageVertexProvider {
+  LineageVertex getLineageVertex();
+}
+```
+
+接口详细信息请参考 
[FLIP-314](https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener).
+
+{{< top >}}
diff --git a/docs/content/docs/deployment/advanced/job_status_listener.md 
b/docs/content/docs/deployment/advanced/job_status_listener.md
index 723cc862594..f17e6665b78 100644
--- a/docs/content/docs/deployment/advanced/job_status_listener.md
+++ b/docs/content/docs/deployment/advanced/job_status_listener.md
@@ -28,7 +28,7 @@ This enables users to implement their own flink lineage 
reporter to send lineage
 
 The job status changed listeners are triggered every time status change 
happened for the application. The data lineage info is included in the 
JobCreatedEvent.
 
-### Implement a plugin for your custom enricher
+### Implement a plugin for Job status changed listener
 
 To implement a custom JobStatusChangedListener plugin, you need to:
 
@@ -79,3 +79,5 @@ Flink components loads JobStatusChangedListener plugins at 
startup. To make sure
 ```
     execution.job-status-changed-listeners = 
org.apache.flink.test.execution.TestingJobStatusChangedListenerFactory
 ```
+
+{{< top >}}
diff --git a/docs/content/docs/internals/data_lineage.md 
b/docs/content/docs/internals/data_lineage.md
new file mode 100644
index 00000000000..679d435e836
--- /dev/null
+++ b/docs/content/docs/internals/data_lineage.md
@@ -0,0 +1,59 @@
+---
+title: Data Lineage
+weight: 12
+type: docs
+aliases:
+  - /internals/data_lineage.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Native Lineage Support
+As organisations look to govern their data ecosystems; understanding data 
lineage, where data is coming from and going to, becomes critical. As Apache 
Flink is widely used for data ingestion and ETL in Streaming Data Lakes, we 
need 
+an end to end lineage solution for scenarios including but not limited to:
+  - `Data Quality Assurance`: Identifying and rectifying data inconsistencies 
by tracing data errors back to their origin within the data pipeline.
+  - `Data Governance`: Establishing clear data ownership and accountability by 
documenting data origins and transformations.
+  - `Regulatory Compliance`: Ensuring adherence to data privacy and compliance 
regulations by tracking data flow and transformations throughout its lifecycle.
+  - `Data Optimization`: Identifying redundant data processing steps and 
optimizing data flows to improve efficiency.
+
+Apache Flink provides a native lineage support by providing an internal 
lineage data model and [Job Status Listener]({{< ref 
"docs/deployment/advanced/job_status_listener" >}}) for
+developer to integrate lineage metadata into external lineage system, for 
example [OpenLineage](https://openlineage.io). When a job is created in Flink 
runtime, the JobCreatedEvent 
+contains the Lineage Graph metadata that will be sent to Job Status Listeners.
+
+# Lineage Data Model
+Flink native lineage interfaces are defined in two layers. The first layer is 
the generic interface for all Flink jobs and connector, and the second layer 
defines
+the extended interfaces for Table and DataStream independently. The interface 
and class relationships are defined in the diagram below.
+
+{{< img src="/fig/lineage_interfaces.png" alt="Lineage Data Model" 
width="80%">}}
+
+By default, Table related lineage interfaces or classes are used in Flink 
Table environment, thus Flink users doesn't need to touch these interfaces. The 
Flink community will gradually support all
+of the common connectors, such as Kafka, JDBC, Cassandra, Hive. If you have a 
customized connector defined, you need to have customized source/sink 
implementations of the LineageVertexProvider interface.
+Within a LineageVertex, a list of Lineage Datasets are defined as metadata for 
Flink source/sink. 
+
+
+```java
+@PublicEvolving
+public interface LineageVertexProvider {
+  LineageVertex getLineageVertex();
+}
+```
+
+For the interface details, please refer to 
[FLIP-314](https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener).
+
+{{< top >}}
diff --git a/docs/static/fig/lineage_interfaces.png 
b/docs/static/fig/lineage_interfaces.png
new file mode 100644
index 00000000000..40718118d80
Binary files /dev/null and b/docs/static/fig/lineage_interfaces.png differ

Reply via email to