shangxinli opened a new issue, #17513:
URL: https://github.com/apache/hudi/issues/17513

   ### Feature Description
   
   **Summary**
   
   This feature introduces table-level lineage metadata in Apache Hudi. Lineage 
records the direct upstream source tables from which a Hudi table is derived 
and stores this information as versioned table metadata.
   
   Today, table lineage is often tracked externally or inferred heuristically, 
leading to inconsistency and loss of historical context. This proposal adds a 
simple, declarative, and deterministic lineage primitive directly to Hudi.
   
   **What is added**
   
   - A new table metadata property recording upstream source tables
   - Lineage represented as a list of catalog.database.table identifiers
   - Lineage versioned implicitly with table metadata evolution
   
   **Example:**
   
   ```
   hoodie.table.lineage.sources = [
     "hive.rawdata.kafka_events",
     "hive.rawdata.users"
   ]
   ```
   
   **Key design points**
   
   - Table-level only (no partition or column lineage)
   - Previous-layer only (one hop)
   - Declared explicitly by writers
   - No inference or query engine dependency
   
   
   ### User Experience
   
   **How users use this feature**
   
   - Opt-in: existing tables and pipelines are unchanged
   - Writers declare lineage during table creation or initial ingestion
   - Normal incremental writes do not modify lineage
   
   **Usage examples**
   
   Declare lineage when creating or rebuilding a table:
   
   ```
   setLineageSources(Arrays.asList(
     "hive.rawdata.kafka_events",
     "hive.rawdata.users"
   ));
   ```
   
   
   **Read lineage:**
   
   `metaClient.getTableConfig().getLineageSources();`
   
   **What users do NOT need to do**
   
   - No schema changes
   - No SQL or query changes
   - No engine upgrades
   - No new runtime dependencies
   
   ### Hudi RFC Requirements
   
   **Non-Goals**
   
   - Column-level lineage
   - Record-level lineage
   - Automatic inference
   - DAG management
   - Query planner changes
   
   **Backward Compatibility**
   
   - Metadata is additive
   - Existing tables unaffected
   - No commit or file-format changes
   
   **Alternatives Considered**
   
   - Commit-level lineage (rejected)
   - Engine-side inference (rejected)
   - External-only lineage systems (rejected)
   
   **Future Work**
   
   - SQL / metadata table exposure
   - Visualization tooling
   - Integration with governance platforms


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to