Re: [PR] fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation [hudi]

via GitHub Thu, 05 Feb 2026 11:30:39 -0800


prashantwason commented on PR #17776:
URL: https://github.com/apache/hudi/pull/17776#issuecomment-3855779652


   Confirmed - the test fails without the fix with the following error:
   
   ```
   org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the 
data schema: `_hoodie_partition_path`
   ```
   
   The fix in IncrementalRelationV1.scala and IncrementalRelationV2.scala 
filters out fields from the skeleton schema that already exist in the data 
schema before merging, preventing this duplicate field error.
   
   Also added an upsert operation to the test to ensure log files are created 
in the MOR table before the incremental query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation [hudi]

Reply via email to