[ 
https://issues.apache.org/jira/browse/HUDI-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8654:
------------------------------
    Description: 
When there is a pending compaction, the new base files to be generated by 
compaction is not available during this transaction. Given the log files in MOR 
from this transaction can be attached to the base file generated by the 
compaction in the latest file slice, the accurate record positions may not be 
derived.  However, the log files written in later delta commits after completed 
compaction have accurate positions.

Similarly, for NBCC, the compaction can be schedule during an inflight 
deltacommit, and in this case the log file generated by the inflight 
deltacommit is associated with the new base file from the compaction, which may 
have different positions because of deletes.

We need to make sure that the file group reader with position-based merging 
generates the correct results in such mix of log blocks.

  was:When there is a pending compaction, the new base files to be generated by 
compaction is not available during this transaction. Given the log files in MOR 
from this transaction can be attached to the base file generated by the 
compaction in the latest file slice, the accurate record positions may not be 
derived.  However, the log files written in later delta commits after completed 
compaction have accurate positions.  We need to make sure that the file group 
reader with position-based merging generate the correct results in such mix of 
log blocks.


> Support correct merging results with record positions in log blocks generated 
> during pending compaction
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8654
>                 URL: https://issues.apache.org/jira/browse/HUDI-8654
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.1
>
>
> When there is a pending compaction, the new base files to be generated by 
> compaction is not available during this transaction. Given the log files in 
> MOR from this transaction can be attached to the base file generated by the 
> compaction in the latest file slice, the accurate record positions may not be 
> derived.  However, the log files written in later delta commits after 
> completed compaction have accurate positions.
> Similarly, for NBCC, the compaction can be schedule during an inflight 
> deltacommit, and in this case the log file generated by the inflight 
> deltacommit is associated with the new base file from the compaction, which 
> may have different positions because of deletes.
> We need to make sure that the file group reader with position-based merging 
> generates the correct results in such mix of log blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to