bvaradar commented on a change in pull request #738: Reading baseCommitTime
from the latest file slice as opposed to the tagged record value
URL: https://github.com/apache/incubator-hudi/pull/738#discussion_r294057391
##########
File path:
hoodie-client/src/main/java/com/uber/hoodie/table/HoodieMergeOnReadTable.java
##########
@@ -489,24 +489,21 @@ public long
convertLogFilesSizeToExpectedParquetSize(List<HoodieLogFile> hoodieL
private HoodieRollbackStat rollback(HoodieIndex hoodieIndex, String
partitionPath, String commit,
HoodieCommitMetadata commitMetadata, final Map<FileStatus, Boolean>
filesToDeletedStatus,
Map<FileStatus, Long> filesToNumBlocksRollback, Set<String>
deletedFiles) {
- // The following needs to be done since GlobalIndex at the moment does not
store the latest commit time.
- // Also, wStat.getPrevCommit() might not give the right commit time in the
following
+ // wStat.getPrevCommit() might not give the right commit time in the
following
// scenario : If a compaction was scheduled, the new commitTime associated
with the requested compaction will be
// used to write the new log files. In this case, the commit time for the
log file is the compaction requested time.
- Map<String, String> fileIdToBaseCommitTimeForLogMap =
- hoodieIndex.isGlobal() ?
this.getRTFileSystemView().getLatestFileSlices(partitionPath)
- .collect(Collectors.toMap(FileSlice::getFileId,
FileSlice::getBaseInstantTime)) : null;
+ // But the index (global) might store the baseCommit of the parquet and
not the requested, hence get the
+ // baseCommit always by listing the file slice
+ Map<String, String> fileIdToBaseCommitTimeForLogMap =
this.getRTFileSystemView().getLatestFileSlices(partitionPath)
+ .collect(Collectors.toMap(FileSlice::getFileId,
FileSlice::getBaseInstantTime));
commitMetadata.getPartitionToWriteStats().get(partitionPath).stream()
.filter(wStat -> {
// Filter out stats without prevCommit since they are all inserts
Review comment:
this will further be simplified after allowing inserts to log files
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services