[
https://issues.apache.org/jira/browse/HUDI-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan reassigned HUDI-6758:
-----------------------------------------
Assignee: sivabalan narayanan
> Avoid duplicated log blocks on the LogRecordReader
> --------------------------------------------------
>
> Key: HUDI-6758
> URL: https://issues.apache.org/jira/browse/HUDI-6758
> Project: Apache Hudi
> Issue Type: Bug
> Components: reader-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
>
> Due to spark retries, we could have duplicated log blocks added during write.
> And since, we don't delete anything during marker based reconciliation on the
> writer side, the reader could see duplicated log blocks. for most of the
> payload implementation, this should not be an issue. But for expression
> payload, it could result in data consistency since an expression could be
> evaluated twice (for eg, colA*2).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)