[
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170445#comment-17170445
]
Udit Mehrotra commented on HUDI-1098:
-------------------------------------
[~vinoth] [~shivnarayan] Actually its not that straight forward. For deleting
S3 FileSystem implementations like EmrFS and
[S3A|https://github.com/apache/hadoop/blob/82f3ffcd64d25cf3a2f5e280e07140994e0ba8cb/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2467]
need to first check if the path exists which is a *GET* operation and get
FileStatus to check whether its a directory or a file, and based on that
deletion needs to be handled differently. For a directory it will have to
*LIST* all the files and send collections of individual file objects in
DeleteObjects calls.
So basically we do need a short wait time before we trigger a *DELETE*
operation, to make sure that the *marker directory* path exists, and the *LIST*
operation where it tries to obtain the list of files under the directory to
delete returns consistent results. So, we do need *Step 1* but we do not need
*Step 3* if the delete operation is successful and does not throw any
exceptions or returns false.
> Marker file finalizing may block on a data file that was never written
> ----------------------------------------------------------------------
>
> Key: HUDI-1098
> URL: https://issues.apache.org/jira/browse/HUDI-1098
> Project: Apache Hudi
> Issue Type: Bug
> Components: Writer Core
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for
> an eventually consistent FS. // Otherwise, we may miss deleting such files.
> If files are not found even after retries, fail the commit
> if (consistencyCheckEnabled) {
> // This will either ensure all files to be deleted are present.
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR);
> }
> {code}
> We need to handle the case where marker file was created, but we crashed
> before the data file was created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)