[
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169380#comment-17169380
]
Vinoth Chandar commented on HUDI-1098:
--------------------------------------
So, I think this is a lot simpler. in this specific scenario we have a marker
file whose data file, we are simply trying to delete. We can just proceed to
delete the data file and ignore if fs.delete() returns false, due to file not
actually being present.
I think s3 guarantees that if the data file was indeed created (PUT), then a
subsequent DELETE will succeed. The caveat of eventual consistency is around
GET (listing) ..
[https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html]
??Amazon S3 provides read-after-write consistency for PUTS of new objects in
your S3 bucket in all Regions with one caveat. The caveat is that if you make a
HEAD or GET request to a key name before the object is created, then create the
object shortly after that, a subsequent GET might not return the object due to
eventual consistency.??
> Marker file finalizing may block on a data file that was never written
> ----------------------------------------------------------------------
>
> Key: HUDI-1098
> URL: https://issues.apache.org/jira/browse/HUDI-1098
> Project: Apache Hudi
> Issue Type: Bug
> Components: Writer Core
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for
> an eventually consistent FS. // Otherwise, we may miss deleting such files.
> If files are not found even after retries, fail the commit
> if (consistencyCheckEnabled) {
> // This will either ensure all files to be deleted are present.
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR);
> }
> {code}
> We need to handle the case where marker file was created, but we crashed
> before the data file was created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)