Toroidals commented on issue #13114:
URL: https://github.com/apache/hudi/issues/13114#issuecomment-2791291088

   > you can delete the duplicate files though, I'm wondering what is the root 
cause here. Did you upgrade from a legacy table?
   
   Could you please clarify which files are considered as duplicated? Is it 
referring to the one mentioned in the first error?
   Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: 
/apps/hive/warehouse/hudi.db/hudi_qwe_rty_cmf_fin_po_headers_cdc/.hoodie/metadata/.hoodie/timeline/history/20250408130311013_20250408132310766_0.parquet
 for client 10.188.29.178 already exists
   Is this the file in question?
   I tried deleting it, but then the job failed to start with the following 
error: file not found.
   
   Also, this is not an upgrade from an old table.
   I’m using Hudi 1.0.0, and the data was initialized using the BULK_INSERT 
method.
   Then, a separate job performs UPSERT to sync incremental data.
   
   Please note that I did not set the write.client.id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to