nsivabalan commented on pull request #2168:
URL: https://github.com/apache/hudi/pull/2168#issuecomment-742791285
> @nsivabalan The use-case you described seems to be intentional but the
behavior is not correct. If the number of records to update is explicitly asked
by the dag, then `Option<Long> numRecordsToUpdate` should be set. In this case,
the code path assumes it's not set. May be there's a bug in setting that
variable ?
> In general, left 2 comments, after they are addressed, can merge this.
I don't think so. this code path assume numFiles is not set, not
numRecordsToUpdate
```
if (!numFiles.isPresent() || numFiles.get() == 0) {
// If num files are not passed, find the number of files to update
based on total records to update and records
// per file
numFilesToUpdate = (int) Math.ceil((double) numRecordsToUpdate.get() /
recordsInSingleFile);
// recordsInSingleFile is not average so we still need to account for
bias is records distribution
// in the files. Limit to the maximum number of files available.
int totalExistingFilesCount =
partitionToFileIdCountMap.values().stream().reduce((a, b) -> a + b).get();
numFilesToUpdate = Math.min(numFilesToUpdate, totalExistingFilesCount);
log.warn("aaa Files to update {}, numRecords toUpdate {}, records in
Single file {} ", numFilesToUpdate, numRecordsToUpdate, recordsInSingleFile);
numRecordsToUpdatePerFile = recordsInSingleFile;
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]