[
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169034#comment-17169034
]
Udit Mehrotra commented on HUDI-1098:
-------------------------------------
[~shivnarayan] [~vinoth] thanks for prioritizing this issue.
Currently that is what the behavior is, the job would fail if all the data
files don't show up (during finalizing writes) even though the data file
couldn't get created in the first place due to S3 throttling or internal errors.
For S3 it is supposed to be eventually consistent within a few 100ms. So, I
think we can have a decent configurable timeout, and if the file still does not
show up after the timeout we can assume that it was not created in the first
place and atleast not fail the job.
> Marker file finalizing may block on a data file that was never written
> ----------------------------------------------------------------------
>
> Key: HUDI-1098
> URL: https://issues.apache.org/jira/browse/HUDI-1098
> Project: Apache Hudi
> Issue Type: Bug
> Components: Writer Core
> Reporter: Vinoth Chandar
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for
> an eventually consistent FS. // Otherwise, we may miss deleting such files.
> If files are not found even after retries, fail the commit
> if (consistencyCheckEnabled) {
> // This will either ensure all files to be deleted are present.
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR);
> }
> {code}
> We need to handle the case where marker file was created, but we crashed
> before the data file was created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)