[
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325952#comment-17325952
]
Vinoth Chandar commented on HUDI-1138:
--------------------------------------
yes. basic idea here is to
0) Maintain the marker file list, in a single file called `markers` under
.hoodie/temp/<instant_time>/ (or whatever path we write this today)
1) Add a new endpoint to timeline server, `/createMarkerFile`, which only
returns 200 only if successfully reads `markers` file, adds an entry to it,
overwrites the `markers` on underlying cloud storage.
2) We employ some batching here, such that we can batch all requests that
arrive in a 100-500ms window in a single overwrite operation.
I think this will work really well (based on similar things I have done
before). wdyt?
Before this, we should also study how effective the current parallelization is.
So hacking up a PoC to see the perf gains would be interesting first step.
> Re-implement marker files via timeline server
> ---------------------------------------------
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Writer Core
> Affects Versions: 0.9.0
> Reporter: Vinoth Chandar
> Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for
> deleting partial files written due to spark task failures/stage retries. It
> will still leave extra files inside the table (and users will pay for it
> every month) and we need the marker mechanism to be able to delete these
> partial files.
> Here we explore if we can improve the current marker file mechanism, that
> creates one marker file per data file written, by
> Delegating the createMarker() call to the driver/timeline server, and have it
> create marker metadata into a single file handle, that is flushed for
> durability guarantees
>
> P.S: I was tempted to think Spark listener mechanism can help us deal with
> failed tasks, but it has no guarantees. the writer job could die without
> deleting a partial file. i.e it can improve things, but cant provide
> guarantees
--
This message was sent by Atlassian Jira
(v8.3.4#803005)