llincc commented on code in PR #9778:
URL: https://github.com/apache/hudi/pull/9778#discussion_r1335802341
##########
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##########
@@ -213,6 +213,8 @@ public R put(T key, R value) {
if (this.inMemoryMap.containsKey(key)) {
this.inMemoryMap.put(key, value);
+ } else if (inDiskContainsKey(key)) {
+ getDiskBasedMap().put(key, value);
} else if (this.currentInMemoryMapSize < this.maxInMemorySizeInBytes) {
this.currentInMemoryMapSize += this.estimatedPayloadSize;
this.inMemoryMap.put(key, value);
Review Comment:
thank you for your comment. The currentInMemoryMapSize may be **reduced**
because of **size esitimate**. you can see the UT of this PR.
This is the reproduce steps as bellow:
1、fullfill in-memory map with records (**means that currentInMemoryMapSize
>= maxInMemorySizeInBytes** ), and **let this.inMemoryMap.size() %
NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0**
2、insert a record with key1 into ExternalSpillableMap which will trigger
size estimate and make sure the currentInMemoryMapSize is still greater than or
equal to the maxInMemorySizeInBytes.
**it will be spilled to disk.**
3、Reduce the size of record of key1 (upsert to ExternalSpillableMap) **which
will make the currentInMemoryMapSize less than maxInMemorySizeInBytes** when
put into ExternalSpillableMap
**then it will be put into in-memory map .**
this bug also cause data duplication in our production environment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]