the-other-tim-brown commented on code in PR #10344:
URL: https://github.com/apache/hudi/pull/10344#discussion_r1433396298
##########
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##########
@@ -78,41 +78,49 @@ public class ExternalSpillableMap<T extends Serializable, R
extends Serializable
// Enables compression of values stored in disc
private final boolean isCompressionEnabled;
// current space occupied by this map in-memory
- private Long currentInMemoryMapSize;
+ private long currentInMemoryMapSize;
// An estimate of the size of each payload written to this map
private volatile long estimatedPayloadSize = 0;
// Base File Path
private final String baseFilePath;
- public ExternalSpillableMap(Long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
+ public ExternalSpillableMap(long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
SizeEstimator<R> valueSizeEstimator) throws
IOException {
this(maxInMemorySizeInBytes, baseFilePath, keySizeEstimator,
valueSizeEstimator, DiskMapType.BITCASK);
}
- public ExternalSpillableMap(Long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
+ public ExternalSpillableMap(long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
SizeEstimator<R> valueSizeEstimator, DiskMapType
diskMapType) throws IOException {
this(maxInMemorySizeInBytes, baseFilePath, keySizeEstimator,
valueSizeEstimator, diskMapType, false);
}
- public ExternalSpillableMap(Long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
+ public ExternalSpillableMap(long maxInMemorySizeInBytes, String
baseFilePath, SizeEstimator<T> keySizeEstimator,
SizeEstimator<R> valueSizeEstimator, DiskMapType
diskMapType, boolean isCompressionEnabled) throws IOException {
this.inMemoryMap = new HashMap<>();
this.baseFilePath = baseFilePath;
- this.maxInMemorySizeInBytes = (long) Math.floor(maxInMemorySizeInBytes *
sizingFactorForInMemoryMap);
+ this.maxInMemorySizeInBytes = (long) Math.floor(maxInMemorySizeInBytes *
SIZING_FACTOR_FOR_IN_MEMORY_MAP);
this.currentInMemoryMapSize = 0L;
this.keySizeEstimator = keySizeEstimator;
this.valueSizeEstimator = valueSizeEstimator;
this.diskMapType = diskMapType;
this.isCompressionEnabled = isCompressionEnabled;
}
+ private DiskMap<T, R> getDiskBasedMap() {
+ return getDiskBasedMap(false);
+ }
+
+ private DiskMap<T, R> getOrCreateDiskBasedMap() {
+ return getDiskBasedMap(true);
+ }
+
private DiskMap<T, R> getDiskBasedMap(boolean forceInitialization) {
if (null == diskBasedMap) {
- if (!forceInitialization) {
- return DiskMap.empty();
- }
synchronized (this) {
if (null == diskBasedMap) {
+ if (!forceInitialization) {
+ return DiskMap.empty();
Review Comment:
In the `put` operation, you will check if the external map contains the key.
If it does, that implies that the external map is already non-empty. Later I
use the `getOrCreate` option which will return an existing `DiskMap` or create
one as the naming implies.
https://github.com/apache/hudi/pull/10344/files#diff-90c0ac84504ff5e04c9de021e9b4f3d14ba092c1e315143764c058c46af21052R236
If the concern is around someone coming into the internals of this class and
messing up the logic, then I would suggest my initial approach where you force
the caller to acknowledge whether they need to force initialization. It is more
verbose but you make the next developer acknowledge that they need to make a
choice.
This code also has unit tests which will catch regressions in the logic.
Given how prevalent the usage is of this class, I think it makes sense to make
it lightweight. For example, the code will setup RocksDB on every instance of
`HoodieMergedLogRecordScanner` regardless of the number of records. In my
opinion, this small logic update is worth the CPU cycles saved in the creation
and closing of these DBs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]