Anton created HUDI-1030:
---------------------------
Summary: Files that Hudi needs to delete during finalize write
step are not present in S3
Key: HUDI-1030
URL: https://issues.apache.org/jira/browse/HUDI-1030
Project: Apache Hudi
Issue Type: Bug
Reporter: Anton
Attachments: codeSnppet.txt, stackTrace.txt
Version used Hudi 0.5.3 + S3 + EMR, when bulk importing large amount of data
(400gb) Hudi fails with exception:
_HoodieCommitException: Failed to complete commit 20200619190257 due to
finalize errors._
_Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check
failed to ensure all files APPEAR_
The log line right before exception says: _Removing duplicate data files
created due to spark retries before committing Paths=[list of files] ._ When __
checking s3 location I can verify that files are not there. When checking
.hoodie/.temp/commitId/partion location, I can verify that files with the same
name but with the ".marker" extension are present. Exception occurs most of the
time we try to import large amount of data. Attached are stack trace of
exception as well as code snippet that does bulk_import.
[^stackTrace.txt][^codeSnppet.txt]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)