[GitHub] [hudi] SteNicholas commented on a change in pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

GitBox Thu, 15 Oct 2020 09:35:19 -0700


SteNicholas commented on a change in pull request #2111:
URL: https://github.com/apache/hudi/pull/2111#discussion_r505682259




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -160,11 +174,15 @@ private void assignInserts(WorkloadProfile profile, 
HoodieEngineContext context)
           if (recordsToAppend > 0 && totalUnassignedInserts > 0) {
             // create a new bucket or re-use an existing bucket
             int bucket;
-            if 
(updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
+            // insert new records regardless of small file when using insert 
operation
+            if (isChangingRecords(profile.getOperationType())
+                    && 
updateLocationToBucket.containsKey(smallFile.location.getFileId())) {
               bucket = 
updateLocationToBucket.get(smallFile.location.getFileId());
               LOG.info("Assigning " + recordsToAppend + " inserts to existing 
update bucket " + bucket);
             } else {
-              bucket = addUpdateBucket(partitionPath, 
smallFile.location.getFileId());
+              bucket = profile.getOperationType() == null || 
isChangingRecords(profile.getOperationType())
+                      ? addUpdateBucket(partitionPath, 
smallFile.location.getFileId())
+                      : addInsertBucket(partitionPath, 
smallFile.location.getFileId());

Review comment:
       @bvaradar @vinothchandar  WDYT? I thought using existing small file id 
would be better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] SteNicholas commented on a change in pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

Reply via email to