Anton created HUDI-1030:
---------------------------

             Summary: Files that Hudi needs to delete during finalize write 
step are not present in S3
                 Key: HUDI-1030
                 URL: https://issues.apache.org/jira/browse/HUDI-1030
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Anton
         Attachments: codeSnppet.txt, stackTrace.txt

Version used Hudi 0.5.3 + S3 + EMR, when bulk importing large amount of data 
(400gb) Hudi fails with exception:

 _HoodieCommitException: Failed to complete commit 20200619190257 due to 
finalize errors._
_Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check 
failed to ensure all files APPEAR_

The log line right before exception says: _Removing duplicate data files 
created due to spark retries before committing Paths=[list of files] ._ When __ 
checking s3 location I can verify that files are not there. When checking 
.hoodie/.temp/commitId/partion location, I can verify that files with the same 
name but with the ".marker" extension are present. Exception occurs most of the 
time we try to import large amount of data. Attached are stack trace of 
exception as well as code snippet that does bulk_import. 
[^stackTrace.txt][^codeSnppet.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to