n3nash commented on pull request #2168:
URL: https://github.com/apache/hudi/pull/2168#issuecomment-744642087
@nsivabalan Yes, please go ahead and change the DAG to the new format. If
there is any easy way to make sure the DagUtils ensure the new dag and old are
backwards compatible (by providing defaults) would be great.
For the following code, see my comment
if (!numFiles.isPresent() || numFiles.get() == 0) {
numFilesToUpdate = (int) Math.ceil((double) numRecordsToUpdate.get() /
recordsInSingleFile); // Nishith - This line ensures we are updating only the
number of files that we should based on this math. I think the bug will happen
when there is only 1 file to update (numRecordsToUpdate < recordsInSingleFile).
We should add a check if that's the case and update numRecordsToUpdatePerFile
to numRecordsToUpdate in that case.
int totalExistingFilesCount =
partitionToFileIdCountMap.values().stream().reduce((a, b) -> a + b).get();
numFilesToUpdate = Math.min(numFilesToUpdate, totalExistingFilesCount);
numRecordsToUpdatePerFile = recordsInSingleFile; // this line ignores
the numRecordsToUpdate passed in and sets to total records in one single file
slice.
}
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]