[DISCUSS][PR] Introduce Metadata SPI for metadata-driven datasource and schema management

xuepeng wang Thu, 26 Mar 2026 18:45:18 -0700

Hi Seatunnel community,

I would like to request reviews and feedback for the following PR:


https://github.com/apache/seatunnel/pull/10657

This PR introduces an initial version of a Metadata SPI, aiming to provide
a metadata-driven mechanism for datasource configuration and schema
management. The goal is to introduce a reusable abstraction layer for
metadata access, enabling flexible integration with external metadata
systems.

This change focuses on introducing the SPI abstraction and does not modify
existing connector behavior.

Below are the key motivations and values of this change:

1. Metadata-driven datasource configuration

Instead of hardcoding datasource connection information directly in job
configurations, this PR allows datasource configurations to be dynamically
provided by external metadata systems.

This design provides several practical benefits:

- Protect sensitive connection information (e.g., username/password) by
externalizing them from job configs
- Support fully customizable storage backends for datasource configurations
- Allow integration with various systems, such as:
  - Nacos
  - Redis
  - Relational databases (e.g., MySQL, PostgreSQL)
  - Metadata platforms like OpenMetadata or DataHub
  - Any custom configuration service

This makes datasource management more secure, centralized, and flexible.

2. Unified schema source for non-relational connectors

For many non-relational source connectors (e.g., message queues, files,
NoSQL systems), users currently need to manually define field mappings or
schemas in job configurations.

With this Metadata SPI:

- Table schemas can be centrally managed and provided by metadata systems
- Connectors can retrieve schema definitions dynamically
- Users no longer need to manually assemble field mappings in every job

This reduces configuration duplication and improves maintainability,
especially in large-scale data integration scenarios.

3. Foundation for future lineage and governance capabilities

This PR focuses on introducing the abstraction layer (Metadata SPI).
This abstraction also makes it possible to support additional capabilities
in the future, such as:

- Data lineage integration
- Metadata synchronization
- Schema evolution management
- Metadata-driven pipeline orchestration

In other words, this change is intended as a foundational step toward
deeper integration between Seatunnel and metadata / governance ecosystems.

Feedback is highly appreciated, especially on the following aspects:

- API design of the Metadata SPI
- Extensibility and integration patterns
- Potential improvements or concerns
- Alignment with Seatunnel architecture direction

Thank you very much for your time and review.

Best regards,
chl-wxp

[DISCUSS][PR] Introduce Metadata SPI for metadata-driven datasource and schema management

Reply via email to