nsivabalan commented on code in PR #9188:
URL: https://github.com/apache/hudi/pull/9188#discussion_r1264147849


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##########
@@ -103,11 +102,6 @@ record -> new ImmutablePair<>(record.getPartitionPath(), 
record.getRecordKey()))
     // Step 3: Tag the incoming records, as inserts or updates, by joining 
with existing record keys
     HoodieData<HoodieRecord<R>> taggedRecords = 
tagLocationBacktoRecords(keyFilenamePairs, records, hoodieTable);
 
-    if (config.getBloomIndexUseCaching()) {

Review Comment:
   I guess this was intentional. After this, taggedRecords is what is getting 
used. and we do cache that in BaseSparkCommitActionExecutor.execute
   ```
    @Override
     public HoodieWriteMetadata<HoodieData<WriteStatus>> 
execute(HoodieData<HoodieRecord<T>> inputRecords) {
       // Cache the tagged records, so we don't end up computing both
       JavaRDD<HoodieRecord<T>> inputRDD = 
HoodieJavaRDD.getJavaRDD(inputRecords);
       if (inputRDD.getStorageLevel() == StorageLevel.NONE()) {
         
HoodieJavaRDD.of(inputRDD).persist(config.getTaggedRecordStorageLevel(),
             context, HoodieDataCacheKey.of(config.getBasePath(), instantTime));
       } else {
         LOG.info("RDD PreppedRecords was persisted at: " + 
inputRDD.getStorageLevel());
       }
   .
   .
   ```
   
   So, not sure if we want to keep the persistance until the very end for these 
rdds which may not be used only. 
   
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java:
##########
@@ -80,7 +81,7 @@ public O updateLocation(O writeStatuses, HoodieEngineContext 
context,
   @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
   public abstract <R> HoodieData<HoodieRecord<R>> tagLocation(
       HoodieData<HoodieRecord<R>> records, HoodieEngineContext context,
-      HoodieTable hoodieTable) throws HoodieIndexException;
+      HoodieTable hoodieTable, Option<String> instantTime) throws 
HoodieIndexException;

Review Comment:
   this is a public api. we might have to deprecate and add a new one if we 
wish to change the signature



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##########
@@ -75,11 +74,11 @@ public HoodieBloomIndex(HoodieWriteConfig config, 
BaseHoodieBloomIndexHelper blo
   @Override
   public <R> HoodieData<HoodieRecord<R>> tagLocation(
       HoodieData<HoodieRecord<R>> records, HoodieEngineContext context,
-      HoodieTable hoodieTable) {
+      HoodieTable hoodieTable, Option<String> instantTime) {
     // Step 0: cache the input records if needed
-    if (config.getBloomIndexUseCaching()) {
-      records.persist(new HoodieConfig(config.getProps())
-          .getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE));
+    if (config.getBloomIndexUseCaching() && instantTime.isPresent()) {
+      String storageLevel = 
config.getString(HoodieIndexConfig.BLOOM_INDEX_INPUT_STORAGE_LEVEL_VALUE);

Review Comment:
   can we move this to constructor and use it everywhere instead of parsing 
multiple times?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to