Swapna Marru created FLINK-39608:
------------------------------------
Summary: KafkaDatasetFacet should extend DatasetConfigFacet to
avoid classloader issues with OpenLineage
Key: FLINK-39608
URL: https://issues.apache.org/jira/browse/FLINK-39608
Project: Flink
Issue Type: Bug
Components: Connectors / Kafka
Reporter: Swapna Marru
The OpenLineage event emitter relies on the availability of the custom
KafkaDatasetFacet class in the classpath
(https://github.com/OpenLineage/OpenLineage/blob/main/integration/flink/flink2/src/main/java/io/openlineage/flink/util/KafkaDatasetFacetUtil.java#L25).
This
fails when the Kafka connector is loaded as part of a user jar while the
OpenLineage library is loaded as part of the Flink distribution.
In this scenario:
- OpenLineage emitters are loaded in ApplicationClassLoader
- Kafka connector libs are loaded in FlinkUserClassLoader
This classloader boundary cannot load
KafkaDatasetFacet and not emit kafka dataset information , even though the
Kafka source is available in lineage datasets information.
Proposed Solution:
Make KafkaDatasetFacet extend
org.apache.flink.streaming.api.lineage.DatasetConfigFacet (which is part of
Flink core and available to both classloaders). The OpenLineage integration can
then:
1. Identify the facet by name
2. Cast to DatasetConfigFacet (a Flink core interface) to extract
configuration
This eliminates the dependency on Kafka connector classes being in the same
classloader as the lineage emitter libs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)