[ 
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169431#comment-17169431
 ] 

sivabalan narayanan commented on HUDI-1098:
-------------------------------------------

IIUC, deletes are always consistent in S3, it is only listing(HEAD and GET) 
that is eventually consistent. If thats the case, then we don't need the 
consistencyCheck only in both cases. 

Step1: we list to ensure all files are present that needs to be deleted. 

Step 2:  perform deletes

Step3: after deletion, we wait for all files deleted are not visible(listing 
should not show these files)

Both validations can be removed if deletes are consistent. But I do see that we 
have retries in during validation checks. So, not sure if it was added after 
encountering some issues w/ S3 or was it more of a pro active code. 

In other words, if deletes after puts are consistent, we can remove step1 and 
step3 from above. 

[~uditme]: Let us know your thoughts. 

 

 

> Marker file finalizing may block on a data file that was never written
> ----------------------------------------------------------------------
>
>                 Key: HUDI-1098
>                 URL: https://issues.apache.org/jira/browse/HUDI-1098
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Vinoth Chandar
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for 
> an eventually consistent FS. // Otherwise, we may miss deleting such files. 
> If files are not found even after retries, fail the commit 
> if (consistencyCheckEnabled) { 
>   // This will either ensure all files to be deleted are present.     
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR); 
> }
> {code}
> We need to handle the case where marker file was created, but we crashed 
> before the data file was created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to