leesf commented on a change in pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#discussion_r505542832
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -160,11 +174,15 @@ private void assignInserts(WorkloadProfile profile,
HoodieEngineContext context)
if (recordsToAppend > 0 && totalUnassignedInserts > 0) {
// create a new bucket or re-use an existing bucket
int bucket;
- if
(updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
+ // insert new records regardless of small file when using insert
operation
+ if (isChangingRecords(profile.getOperationType())
+ &&
updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
bucket =
updateLocationToBucket.get(smallFile.location.getFileId());
LOG.info("Assigning " + recordsToAppend + " inserts to existing
update bucket " + bucket);
} else {
- bucket = addUpdateBucket(partitionPath,
smallFile.location.getFileId());
+ bucket = profile.getOperationType() == null ||
isChangingRecords(profile.getOperationType())
+ ? addUpdateBucket(partitionPath,
smallFile.location.getFileId())
+ : addInsertBucket(partitionPath,
smallFile.location.getFileId());
Review comment:
maybe we would create a new FileID instead of using exist small file id
@bvaradar WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]