[
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327052#comment-17327052
]
liwei commented on HUDI-1138:
-----------------------------
[~vinoth] thanks
1. I have a idea, can we update the file to metatable in timeline server. As we
can unify the meta info to metatable ?
2. Now rollback is not a frequency action. So we need poc the perf first.
3. I recently also research RFC-27. I think if we can unify the metadata such
as partitions, markfiles, statistics ,index or others. Just as delta lake use
delta log store this , and snowflake use metaservice . The unify metatable can
resolve cloud storage poor meta manage 、 compute and storage query performance
. I think RFC-27. RFC - 15 . RFC-08 have some overlaps. Want to discuss with
you ! Thanks
> Re-implement marker files via timeline server
> ---------------------------------------------
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Writer Core
> Affects Versions: 0.9.0
> Reporter: Vinoth Chandar
> Priority: Blocker
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for
> deleting partial files written due to spark task failures/stage retries. It
> will still leave extra files inside the table (and users will pay for it
> every month) and we need the marker mechanism to be able to delete these
> partial files.
> Here we explore if we can improve the current marker file mechanism, that
> creates one marker file per data file written, by
> Delegating the createMarker() call to the driver/timeline server, and have it
> create marker metadata into a single file handle, that is flushed for
> durability guarantees
>
> P.S: I was tempted to think Spark listener mechanism can help us deal with
> failed tasks, but it has no guarantees. the writer job could die without
> deleting a partial file. i.e it can improve things, but cant provide
> guarantees
--
This message was sent by Atlassian Jira
(v8.3.4#803005)