[ 
https://issues.apache.org/jira/browse/HUDI-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2400:
-----------------------------
    Remaining Estimate: 0.5h
     Original Estimate: 0.5h

> Allow timeline server correctly sync when concurrent write to timeline
> ----------------------------------------------------------------------
>
>                 Key: HUDI-2400
>                 URL: https://issues.apache.org/jira/browse/HUDI-2400
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: compaction
>            Reporter: ZiyueGuan
>            Priority: Major
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Firstly, assume HUDI-1847 is available and we can have an ingestion spark job 
> and a compaction job running at the same time.
> Assume we have a timestamp for each HoodieTimeLine object which represent the 
> time it generated from hdfs.
> Considering following case,
>  1. ingestion schedule compaction inline. Now we have a timeline: 
> 1.deltaCommit.Completed, 2.Compaction.Requested (TimeStamp: 1L)
>  2. Then ingestion keep move on. We now have 1.deltaCommit.Completed, 
> 2.Compaction.Requested 3.deltaCommit.Inflight (TimeStamp: 2L) in ingestion 
> job.
>  3. We have an independent Spark job run compaction 2. We now have 
> 1.deltaCommit.Completed, 2.Compaction.Inflight 3.deltaCommit.Inflight 
> (TimeStamp: 3L)
>  4. Executors in ingestion job send request to timeline server, now they hold 
> timeline with TimeStamp 2L. But Timeline Server have timestamp 3L which is 
> later than client.
> According to the logic in 
> https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java#L137,
>  
> we thought local view of table's timeline is behind that of client's view as 
> long as the timeline hashes are different. However this may not be true in 
> the case mentioned above.
> Here the hashes are different because client view is behind local view.
> A simple solution is to add an attribute to timeline which is the timestamp 
> we used above. 
> And timeline server may determine whether to sync fileSystemView by comparing 
> timestamps between client and local rather than the difference between 
> timeline hashes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to