[ 
https://issues.apache.org/jira/browse/HUDI-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linleicheng updated HUDI-6892:
------------------------------
    Description: 
reproduce:

1、fullfill in-memory map with records, and let this.inMemoryMap.size() % 
NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0

2、insert a record with key1 into ExternalSpillableMap (which will cause size 
estimate and make sure the currentInMemoryMapSize is still greater than or 
equal to the maxInMemorySizeInBytes).
   it will be spilled to disk. 

3、Reduce the size of record of key1 which will make the currentInMemoryMapSize 
less than maxInMemorySizeInBytes when put into ExternalSpillableMap
   it will be put into in-memory map.
   
data duplication when iterator finally.

  was:
reproduce:

1、fullfill in-memory map with records, and let this.inMemoryMap.size() % 
NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0

2、insert a record with key1 into ExternalSpillableMap (which will cause size 
estimate and make sure the currentInMemoryMapSize is still greater than or 
equal to the maxInMemorySizeInBytes).
   it will be spilled to disk. 

3、Reduce the size of record of key1 which will make the currentInMemoryMapSize 
less than maxInMemorySizeInBytes when put into ExternalSpillableMap
   it will be put into in-memory map.
   
data duplication finally.


> ExternalSpillableMap may cause data duplication when flink compaction
> ---------------------------------------------------------------------
>
>                 Key: HUDI-6892
>                 URL: https://issues.apache.org/jira/browse/HUDI-6892
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Linleicheng
>            Priority: Major
>              Labels: pull-request-available
>
> reproduce:
> 1、fullfill in-memory map with records, and let this.inMemoryMap.size() % 
> NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0
> 2、insert a record with key1 into ExternalSpillableMap (which will cause size 
> estimate and make sure the currentInMemoryMapSize is still greater than or 
> equal to the maxInMemorySizeInBytes).
>    it will be spilled to disk. 
> 3、Reduce the size of record of key1 which will make the 
> currentInMemoryMapSize less than maxInMemorySizeInBytes when put into 
> ExternalSpillableMap
>    it will be put into in-memory map.
>    
> data duplication when iterator finally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to