[
https://issues.apache.org/jira/browse/HUDI-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8654:
------------------------------
Description:
When there is a pending compaction, the new base files to be generated by
compaction is not available during this transaction. Given the log files in MOR
from this transaction can be attached to the base file generated by the
compaction in the latest file slice, the accurate record positions may not be
derived. However, the log files written in later delta commits after completed
compaction have accurate positions.
Similarly, for NBCC, the compaction can be schedule during an inflight
deltacommit, and in this case the log file generated by the inflight
deltacommit is associated with the new base file from the compaction, which may
have different positions because of deletes.
We need to make sure that the file group reader with position-based merging
generates the correct results in such mix of log blocks.
was:When there is a pending compaction, the new base files to be generated by
compaction is not available during this transaction. Given the log files in MOR
from this transaction can be attached to the base file generated by the
compaction in the latest file slice, the accurate record positions may not be
derived. However, the log files written in later delta commits after completed
compaction have accurate positions. We need to make sure that the file group
reader with position-based merging generate the correct results in such mix of
log blocks.
> Support correct merging results with record positions in log blocks generated
> during pending compaction
> -------------------------------------------------------------------------------------------------------
>
> Key: HUDI-8654
> URL: https://issues.apache.org/jira/browse/HUDI-8654
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.1
>
>
> When there is a pending compaction, the new base files to be generated by
> compaction is not available during this transaction. Given the log files in
> MOR from this transaction can be attached to the base file generated by the
> compaction in the latest file slice, the accurate record positions may not be
> derived. However, the log files written in later delta commits after
> completed compaction have accurate positions.
> Similarly, for NBCC, the compaction can be schedule during an inflight
> deltacommit, and in this case the log file generated by the inflight
> deltacommit is associated with the new base file from the compaction, which
> may have different positions because of deletes.
> We need to make sure that the file group reader with position-based merging
> generates the correct results in such mix of log blocks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)