codope commented on code in PR #12982:
URL: https://github.com/apache/hudi/pull/12982#discussion_r2023200015


##########
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##########
@@ -92,7 +92,7 @@ public class ExternalSpillableMap<T extends Serializable, R> 
implements Map<T, R
   public ExternalSpillableMap(long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
                               SizeEstimator<R> valueSizeEstimator, DiskMapType 
diskMapType, CustomSerializer<R> valueSerializer,
                               boolean isCompressionEnabled, String 
loggingContext) throws IOException {
-    this.inMemoryMap = new HashMap<>();
+    this.inMemoryMap = new ConcurrentHashMap<>();

Review Comment:
   I agree. The FSView should control the concurrent modification. We should 
fix it at the FSView level itself.
   
   As @zhangyue19921010 has pointed out `put()` has shared mutable state - 
`currentInMemoryMapSize` and `estimatedPayloadSize` are updated during put() 
without any atomicity or locking, which can lead to race conditions. Another 
aspect is the lazy initialization and subsequent access to `diskBasedMap` (even 
though its initialization is partially synchronized) might not be safe under 
concurrent modifications if the underlying DiskMap isn’t designed for 
concurrent access.
   
   Coming back to FSView. Even the spillablle disk versions of FSView extend 
from `AbstractTableFileSystemView` which already takes renentrant read/write 
lock. Did we miss any API? Anyway, i think FSiew is the right point to fix. cc 
@yihua @danny0405 @nsivabalan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to