the-other-tim-brown commented on code in PR #10344:
URL: https://github.com/apache/hudi/pull/10344#discussion_r1432733293


##########
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##########
@@ -78,41 +78,49 @@ public class ExternalSpillableMap<T extends Serializable, R 
extends Serializable
   // Enables compression of values stored in disc
   private final boolean isCompressionEnabled;
   // current space occupied by this map in-memory
-  private Long currentInMemoryMapSize;
+  private long currentInMemoryMapSize;
   // An estimate of the size of each payload written to this map
   private volatile long estimatedPayloadSize = 0;
   // Base File Path
   private final String baseFilePath;
 
-  public ExternalSpillableMap(Long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
+  public ExternalSpillableMap(long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
                               SizeEstimator<R> valueSizeEstimator) throws 
IOException {
     this(maxInMemorySizeInBytes, baseFilePath, keySizeEstimator, 
valueSizeEstimator, DiskMapType.BITCASK);
   }
 
-  public ExternalSpillableMap(Long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
+  public ExternalSpillableMap(long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
                               SizeEstimator<R> valueSizeEstimator, DiskMapType 
diskMapType) throws IOException {
     this(maxInMemorySizeInBytes, baseFilePath, keySizeEstimator, 
valueSizeEstimator, diskMapType, false);
   }
 
-  public ExternalSpillableMap(Long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
+  public ExternalSpillableMap(long maxInMemorySizeInBytes, String 
baseFilePath, SizeEstimator<T> keySizeEstimator,
                               SizeEstimator<R> valueSizeEstimator, DiskMapType 
diskMapType, boolean isCompressionEnabled) throws IOException {
     this.inMemoryMap = new HashMap<>();
     this.baseFilePath = baseFilePath;
-    this.maxInMemorySizeInBytes = (long) Math.floor(maxInMemorySizeInBytes * 
sizingFactorForInMemoryMap);
+    this.maxInMemorySizeInBytes = (long) Math.floor(maxInMemorySizeInBytes * 
SIZING_FACTOR_FOR_IN_MEMORY_MAP);
     this.currentInMemoryMapSize = 0L;
     this.keySizeEstimator = keySizeEstimator;
     this.valueSizeEstimator = valueSizeEstimator;
     this.diskMapType = diskMapType;
     this.isCompressionEnabled = isCompressionEnabled;
   }
 
+  private DiskMap<T, R> getDiskBasedMap() {
+    return getDiskBasedMap(false);
+  }
+
+  private DiskMap<T, R> getOrCreateDiskBasedMap() {
+    return getDiskBasedMap(true);
+  }
+
   private DiskMap<T, R> getDiskBasedMap(boolean forceInitialization) {
     if (null == diskBasedMap) {
-      if (!forceInitialization) {
-        return DiskMap.empty();
-      }
       synchronized (this) {
         if (null == diskBasedMap) {
+          if (!forceInitialization) {
+            return DiskMap.empty();

Review Comment:
   Ok this conversation is going in circles. Can you and @nsivabalan align on 
the semantics you want? Previously I made all the calls explicit about whether 
they wanted to force initialization of the map and then I was asked to change 
that. I think it makes more sense to make it explicit personally since these 
are all internal methods so there doesn't need to be another layer of 
abstraction here that can cause confusion.
   
   @danny0405 I think you are missing some of the details here. The outside 
caller does not need to worry since all the implementation details are hidden 
from them as they should be. The calls in `put` will only instantiate the map 
when required. You don't need to instantiate a map to check if a key exists for 
example. If the map is not instantiated previously, the key will not be present 
in it.
   
   The cost of instantiating a map on every use is much more expensive. You are 
doing IO and allocating more objects. If you use RocksDB as your map type, 
you're going to create column family and all sorts of resources 
([source](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java#L91)).
 If you use BitCaskDiskMap you are creating file handles and more objects which 
is lighter weight but still non-zero impact.
   
   I believe this code was intended to have a lazy initialization but was just 
not implemented correctly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to