n3nash commented on pull request #2168:
URL: https://github.com/apache/hudi/pull/2168#issuecomment-744642087


   @nsivabalan Yes, please go ahead and change the DAG to the new format. If 
there is any easy way to make sure the DagUtils ensure the new dag and old are 
backwards compatible (by providing defaults) would be great. 
   
   For the following code, see my comment
   
   if (!numFiles.isPresent() || numFiles.get() == 0) {
         numFilesToUpdate = (int) Math.ceil((double) numRecordsToUpdate.get() / 
recordsInSingleFile); // Nishith - This line ensures we are updating only the 
number of files that we should based on this math. I think the bug will happen 
when there is only 1 file to update (numRecordsToUpdate < recordsInSingleFile). 
We should add a check if that's the case and update numRecordsToUpdatePerFile 
to numRecordsToUpdate in that case. 
   
         int totalExistingFilesCount = 
partitionToFileIdCountMap.values().stream().reduce((a, b) -> a + b).get();
         numFilesToUpdate = Math.min(numFilesToUpdate, totalExistingFilesCount);
         numRecordsToUpdatePerFile = recordsInSingleFile; // this line ignores 
the numRecordsToUpdate passed in and sets to total records in one single file 
slice. 
       }


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to