[
https://issues.apache.org/jira/browse/HUDI-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Linleicheng updated HUDI-6892:
------------------------------
Affects Version/s: (was: 0.14.0)
Priority: Critical (was: Major)
> ExternalSpillableMap may cause data duplication when flink compaction
> ---------------------------------------------------------------------
>
> Key: HUDI-6892
> URL: https://issues.apache.org/jira/browse/HUDI-6892
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Linleicheng
> Priority: Critical
> Labels: pull-request-available
>
> reproduce:
> 1、fullfill in-memory map with records, and let this.inMemoryMap.size() %
> NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0
> 2、insert a record with key1 into ExternalSpillableMap (which will cause size
> estimate and make sure the currentInMemoryMapSize is still greater than or
> equal to the maxInMemorySizeInBytes).
> it will be spilled to disk.
> 3、Reduce the size of record of key1 which will make the
> currentInMemoryMapSize less than maxInMemorySizeInBytes when put into
> ExternalSpillableMap
> it will be put into in-memory map.
>
> data duplication when iterator finally.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)