Francisco Morillo created FLINK-39813:
-----------------------------------------
Summary: [Connectors/Kinesis] Add lineage support to Kinesis
Streams connector
Key: FLINK-39813
URL: https://issues.apache.org/jira/browse/FLINK-39813
Project: Flink
Issue Type: Improvement
Components: Connectors / Kinesis
Affects Versions: aws-connector-6.0.0
Reporter: Francisco Morillo
The Kinesis Streams connector (flink-connector-aws-kinesis-streams) does not
implement the LineageVertexProvider
interface introduced in Flink 2.0. This means lineage tools like OpenLineage
cannot automatically extract dataset
information from Flink jobs using KDS sources and sinks.
The Kafka connector already implements this successfully. This change adds
the same capability to the Kinesis Streams
connector.
Implementation:
- KinesisStreamsSource implements LineageVertexProvider, returning
SourceLineageVertex with namespace derived from
stream ARN
- KinesisStreamsSink implements LineageVertexProvider, returning
LineageVertex with the same ARN-based namespace
- New utility package: KinesisLineageUtil, KinesisDatasetFacet,
TypeDatasetFacet
- Unit tests covering source, sink, and utility methods
Namespace format: arn:\{partition}:kinesis:\{region}:\{account}:stream
Dataset name: stream name extracted from the ARN
This covers:
- DataStream API (KinesisStreamsSource/Sink implement the interface directly)
- SQL/Table API (KinesisDynamicSource internally creates
KinesisStreamsSource, planner extracts lineage automatically)
End-to-end tested with OpenLineage flink2 module (v1.47.1) confirming lineage
events are emitted with correct
namespace and stream name when using the DataStream API.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)