[ 
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170445#comment-17170445
 ] 

Udit Mehrotra commented on HUDI-1098:
-------------------------------------

[~vinoth] [~shivnarayan] Actually its not that straight forward. For deleting 
S3 FileSystem implementations like EmrFS and 
[S3A|https://github.com/apache/hadoop/blob/82f3ffcd64d25cf3a2f5e280e07140994e0ba8cb/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2467]
 need to first check if the path exists which is a *GET* operation and get 
FileStatus to check whether its a directory or a file, and based on that 
deletion needs to be handled differently. For a directory it will have to 
*LIST* all the files and send collections of individual file objects in 
DeleteObjects calls.

So basically we do need a short wait time before we trigger a *DELETE* 
operation, to make sure that the *marker directory* path exists, and the *LIST* 
operation where it tries to obtain the list of files under the directory to 
delete returns consistent results. So, we do need *Step 1* but we do not need 
*Step 3* if the delete operation is successful and does not throw any 
exceptions or returns false.

 

> Marker file finalizing may block on a data file that was never written
> ----------------------------------------------------------------------
>
>                 Key: HUDI-1098
>                 URL: https://issues.apache.org/jira/browse/HUDI-1098
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for 
> an eventually consistent FS. // Otherwise, we may miss deleting such files. 
> If files are not found even after retries, fail the commit 
> if (consistencyCheckEnabled) { 
>   // This will either ensure all files to be deleted are present.     
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR); 
> }
> {code}
> We need to handle the case where marker file was created, but we crashed 
> before the data file was created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to