SteNicholas commented on a change in pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#discussion_r505682259
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -160,11 +174,15 @@ private void assignInserts(WorkloadProfile profile,
HoodieEngineContext context)
if (recordsToAppend > 0 && totalUnassignedInserts > 0) {
// create a new bucket or re-use an existing bucket
int bucket;
- if
(updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
+ // insert new records regardless of small file when using insert
operation
+ if (isChangingRecords(profile.getOperationType())
+ &&
updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
bucket =
updateLocationToBucket.get(smallFile.location.getFileId());
LOG.info("Assigning " + recordsToAppend + " inserts to existing
update bucket " + bucket);
} else {
- bucket = addUpdateBucket(partitionPath,
smallFile.location.getFileId());
+ bucket = profile.getOperationType() == null ||
isChangingRecords(profile.getOperationType())
+ ? addUpdateBucket(partitionPath,
smallFile.location.getFileId())
+ : addInsertBucket(partitionPath,
smallFile.location.getFileId());
Review comment:
@bvaradar @vinothchandar WDYT? I thought using existing small file id
would be better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]