Hi Seatunnel community, I would like to request reviews and feedback for the following PR:
https://github.com/apache/seatunnel/pull/10657 This PR introduces an initial version of a Metadata SPI, aiming to provide a metadata-driven mechanism for datasource configuration and schema management. The goal is to introduce a reusable abstraction layer for metadata access, enabling flexible integration with external metadata systems. This change focuses on introducing the SPI abstraction and does not modify existing connector behavior. Below are the key motivations and values of this change: 1. Metadata-driven datasource configuration Instead of hardcoding datasource connection information directly in job configurations, this PR allows datasource configurations to be dynamically provided by external metadata systems. This design provides several practical benefits: - Protect sensitive connection information (e.g., username/password) by externalizing them from job configs - Support fully customizable storage backends for datasource configurations - Allow integration with various systems, such as: - Nacos - Redis - Relational databases (e.g., MySQL, PostgreSQL) - Metadata platforms like OpenMetadata or DataHub - Any custom configuration service This makes datasource management more secure, centralized, and flexible. 2. Unified schema source for non-relational connectors For many non-relational source connectors (e.g., message queues, files, NoSQL systems), users currently need to manually define field mappings or schemas in job configurations. With this Metadata SPI: - Table schemas can be centrally managed and provided by metadata systems - Connectors can retrieve schema definitions dynamically - Users no longer need to manually assemble field mappings in every job This reduces configuration duplication and improves maintainability, especially in large-scale data integration scenarios. 3. Foundation for future lineage and governance capabilities This PR focuses on introducing the abstraction layer (Metadata SPI). This abstraction also makes it possible to support additional capabilities in the future, such as: - Data lineage integration - Metadata synchronization - Schema evolution management - Metadata-driven pipeline orchestration In other words, this change is intended as a foundational step toward deeper integration between Seatunnel and metadata / governance ecosystems. Feedback is highly appreciated, especially on the following aspects: - API design of the Metadata SPI - Extensibility and integration patterns - Potential improvements or concerns - Alignment with Seatunnel architecture direction Thank you very much for your time and review. Best regards, chl-wxp
