[GitHub] [hudi] leesf commented on a change in pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

GitBox Thu, 15 Oct 2020 06:30:29 -0700


leesf commented on a change in pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#discussion_r505542832




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -160,11 +174,15 @@ private void assignInserts(WorkloadProfile profile, 
HoodieEngineContext context)
           if (recordsToAppend > 0 && totalUnassignedInserts > 0) {
             // create a new bucket or re-use an existing bucket
             int bucket;
-            if 
(updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
+            // insert new records regardless of small file when using insert 
operation
+            if (isChangingRecords(profile.getOperationType())
+                    && 
updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
               bucket = 
updateLocationToBucket.get(smallFile.location.getFileId());
               LOG.info("Assigning " + recordsToAppend + " inserts to existing 
update bucket " + bucket);
             } else {
-              bucket = addUpdateBucket(partitionPath, 
smallFile.location.getFileId());
+              bucket = profile.getOperationType() == null || 
isChangingRecords(profile.getOperationType())
+                      ? addUpdateBucket(partitionPath, 
smallFile.location.getFileId())
+                      : addInsertBucket(partitionPath, 
smallFile.location.getFileId());

Review comment:
       maybe we would create a new FileID instead of using exist small file id 
@bvaradar WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] leesf commented on a change in pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

Reply via email to