[
https://issues.apache.org/jira/browse/HUDI-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-2400:
-----------------------------
Remaining Estimate: 0.5h
Original Estimate: 0.5h
> Allow timeline server correctly sync when concurrent write to timeline
> ----------------------------------------------------------------------
>
> Key: HUDI-2400
> URL: https://issues.apache.org/jira/browse/HUDI-2400
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: compaction
> Reporter: ZiyueGuan
> Priority: Major
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> Firstly, assume HUDI-1847 is available and we can have an ingestion spark job
> and a compaction job running at the same time.
> Assume we have a timestamp for each HoodieTimeLine object which represent the
> time it generated from hdfs.
> Considering following case,
> 1. ingestion schedule compaction inline. Now we have a timeline:
> 1.deltaCommit.Completed, 2.Compaction.Requested (TimeStamp: 1L)
> 2. Then ingestion keep move on. We now have 1.deltaCommit.Completed,
> 2.Compaction.Requested 3.deltaCommit.Inflight (TimeStamp: 2L) in ingestion
> job.
> 3. We have an independent Spark job run compaction 2. We now have
> 1.deltaCommit.Completed, 2.Compaction.Inflight 3.deltaCommit.Inflight
> (TimeStamp: 3L)
> 4. Executors in ingestion job send request to timeline server, now they hold
> timeline with TimeStamp 2L. But Timeline Server have timestamp 3L which is
> later than client.
> According to the logic in
> https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java#L137,
>
> we thought local view of table's timeline is behind that of client's view as
> long as the timeline hashes are different. However this may not be true in
> the case mentioned above.
> Here the hashes are different because client view is behind local view.
> A simple solution is to add an attribute to timeline which is the timestamp
> we used above.
> And timeline server may determine whether to sync fileSystemView by comparing
> timestamps between client and local rather than the difference between
> timeline hashes.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)