[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327052#comment-17327052
 ] 

liwei commented on HUDI-1138:
-----------------------------

[~vinoth] thanks 

1. I have a idea, can we update the file to metatable in timeline server. As we 
can unify the meta info to metatable ?

2. Now rollback is not  a frequency action. So we need poc the perf first.

3. I recently also research RFC-27. I think if we can unify the metadata such 
as partitions, markfiles, statistics ,index or others. Just as delta lake use 
delta log store this , and snowflake use metaservice . The unify metatable can 
resolve cloud storage poor meta manage 、 compute and storage query performance 
. I think RFC-27. RFC - 15 . RFC-08 have some overlaps. Want to discuss with  
you !  Thanks 

> Re-implement marker files via timeline server
> ---------------------------------------------
>
>                 Key: HUDI-1138
>                 URL: https://issues.apache.org/jira/browse/HUDI-1138
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Writer Core
>    Affects Versions: 0.9.0
>            Reporter: Vinoth Chandar
>            Priority: Blocker
>             Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to