[
https://issues.apache.org/jira/browse/HUDI-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916969#comment-17916969
]
Y Ethan Guo commented on HUDI-8916:
-----------------------------------
Actually, no caller uses the instant time in the current location after
indexing phase in a materialized way, except the positional merging draft
change in HUDI-8654, but for HUDI-8654 we've decided to directly use the file
system view to get the base file instant time of the lastest file slice in a
file group, instead of relying on the instant time of the current location in
the HoodieRecords which can be hacky. So there is no impact or correctness
issue here. Deprioritizing it to revisit and close later.
> Return base instant time in prepped upsert flow for SQL UPDATE and DELETE
> -------------------------------------------------------------------------
>
> Key: HUDI-8916
> URL: https://issues.apache.org/jira/browse/HUDI-8916
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.1
>
> Original Estimate: 16h
> Remaining Estimate: 16h
>
> When doing prepped upsert flow for SQL UPDATE and DELETE, we use snapshot
> read to get the meta columns and decide the instant time and current location
> of the records. The instant time stored in the current location for the
> HoodieRecord might not be the base file instant time, if the records have
> updates in log files (in snapshot read, merging can pick record and commit
> time from the log file). This is in contrary to the assumption that the
> instant time stored should be base file instant time from indexing in Spark.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)