[ 
https://issues.apache.org/jira/browse/HUDI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hd zhou updated HUDI-3644:
--------------------------
    Description: 
AbstractHoodieLogRecordReader 

 
{code:java}
//代码占位符
if (!completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)
    || inflightInstantsTimeline.containsInstant(instantTime)) {
  // hit an uncommitted block possibly from a failed write, move to the next 
one and skip processing this one
  continue;
} {code}
 

completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)  is true 
will merge log file. this is not good.

 

when log file block append sucess.  And deltacommit rollback. And this instance 
time is before activeTimeline starts. This log file block will be merged, cause 
data duplication.

 

 

  was:
AbstractHoodieLogRecordReader 

 
{code:java}
//代码占位符
if (!completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)
    || inflightInstantsTimeline.containsInstant(instantTime)) {
  // hit an uncommitted block possibly from a failed write, move to the next 
one and skip processing this one
  continue;
} {code}
 

completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)  is true 
will merge log file. this is not good.

 

when log file block append sucess.  And deltacommit rollback. And this instance 
time is not before activeTimeline starts. This log file block will be merged, 
cause data duplication.

 

 


> hoodie log scan bug cause data duplication
> ------------------------------------------
>
>                 Key: HUDI-3644
>                 URL: https://issues.apache.org/jira/browse/HUDI-3644
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: hd zhou
>            Priority: Major
>              Labels: pull-request-available
>
> AbstractHoodieLogRecordReader 
>  
> {code:java}
> //代码占位符
> if (!completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)
>     || inflightInstantsTimeline.containsInstant(instantTime)) {
>   // hit an uncommitted block possibly from a failed write, move to the next 
> one and skip processing this one
>   continue;
> } {code}
>  
> completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)  is 
> true will merge log file. this is not good.
>  
> when log file block append sucess.  And deltacommit rollback. And this 
> instance time is before activeTimeline starts. This log file block will be 
> merged, cause data duplication.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to