bvaradar commented on a change in pull request #738: Reading baseCommitTime 
from the latest file slice as opposed to the tagged record value
URL: https://github.com/apache/incubator-hudi/pull/738#discussion_r294057391
 
 

 ##########
 File path: 
hoodie-client/src/main/java/com/uber/hoodie/table/HoodieMergeOnReadTable.java
 ##########
 @@ -489,24 +489,21 @@ public long 
convertLogFilesSizeToExpectedParquetSize(List<HoodieLogFile> hoodieL
   private HoodieRollbackStat rollback(HoodieIndex hoodieIndex, String 
partitionPath, String commit,
       HoodieCommitMetadata commitMetadata, final Map<FileStatus, Boolean> 
filesToDeletedStatus,
       Map<FileStatus, Long> filesToNumBlocksRollback, Set<String> 
deletedFiles) {
-    // The following needs to be done since GlobalIndex at the moment does not 
store the latest commit time.
-    // Also, wStat.getPrevCommit() might not give the right commit time in the 
following
+    // wStat.getPrevCommit() might not give the right commit time in the 
following
     // scenario : If a compaction was scheduled, the new commitTime associated 
with the requested compaction will be
     // used to write the new log files. In this case, the commit time for the 
log file is the compaction requested time.
-    Map<String, String> fileIdToBaseCommitTimeForLogMap =
-        hoodieIndex.isGlobal() ? 
this.getRTFileSystemView().getLatestFileSlices(partitionPath)
-            .collect(Collectors.toMap(FileSlice::getFileId, 
FileSlice::getBaseInstantTime)) : null;
+    // But the index (global) might store the baseCommit of the parquet and 
not the requested, hence get the
+    // baseCommit always by listing the file slice
+    Map<String, String> fileIdToBaseCommitTimeForLogMap = 
this.getRTFileSystemView().getLatestFileSlices(partitionPath)
+            .collect(Collectors.toMap(FileSlice::getFileId, 
FileSlice::getBaseInstantTime));
     commitMetadata.getPartitionToWriteStats().get(partitionPath).stream()
         .filter(wStat -> {
           // Filter out stats without prevCommit since they are all inserts
 
 Review comment:
   this will further be simplified after allowing inserts to log files

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to