nada-attia opened a new issue, #18008:
URL: https://github.com/apache/hudi/issues/18008

   ### Task Description
   
   **Why this task is needed:**
   Currently, Hudi performs HMS schema sync as a post-commit operation. This 
creates a critical failure scenario: if a writer successfully commits data with 
an evolved schema but the subsequent HMS sync fails, the Hudi table schema and 
HMS schema diverge. This divergence causes query failures for downstream 
consumers (Spark, Presto) that rely on HMS for schema metadata, and requires 
manual intervention to reconcile the schemas (i.e. rollback the commits which 
introduced schema changes).
   
   **What needs to be done:**
   To prevent this issue, Deltastreamer and Datasource writers should perform 
HMS schema sync before creating a commit when schema changes are detected and 
hoodie.datasource.hive_sync.enable=true. If the pre-commit HMS sync fails, the 
write operation should fail without creating a commit, ensuring that the Hudi 
table schema and HMS schema always remain consistent. This approach provides 
fail-fast behavior and eliminates the schema divergence window entirely.
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to