nsivabalan commented on pull request #2168:
URL: https://github.com/apache/hudi/pull/2168#issuecomment-742791285


   > @nsivabalan The use-case you described seems to be intentional but the 
behavior is not correct. If the number of records to update is explicitly asked 
by the dag, then `Option<Long> numRecordsToUpdate` should be set. In this case, 
the code path assumes it's not set. May be there's a bug in setting that 
variable ?
   > In general, left 2 comments, after they are addressed, can merge this.
   
   I don't think so. this code path assume numFiles is not set, not 
numRecordsToUpdate
   
   
   ```
   if (!numFiles.isPresent() || numFiles.get() == 0) {
         // If num files are not passed, find the number of files to update 
based on total records to update and records
         // per file
         numFilesToUpdate = (int) Math.ceil((double) numRecordsToUpdate.get() / 
recordsInSingleFile);
         // recordsInSingleFile is not average so we still need to account for 
bias is records distribution
         // in the files. Limit to the maximum number of files available.
         int totalExistingFilesCount = 
partitionToFileIdCountMap.values().stream().reduce((a, b) -> a + b).get();
         numFilesToUpdate = Math.min(numFilesToUpdate, totalExistingFilesCount);
         log.warn("aaa Files to update {}, numRecords toUpdate {}, records in 
Single file {} ", numFilesToUpdate, numRecordsToUpdate, recordsInSingleFile);
         numRecordsToUpdatePerFile = recordsInSingleFile;
       }


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to