yihua commented on a change in pull request #5090:
URL: https://github.com/apache/hudi/pull/5090#discussion_r832525843



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/ConcurrentOperation.java
##########
@@ -116,14 +116,27 @@ private void init(HoodieInstant instant) {
                 
this.metadataWrapper.getMetadataFromTimeline().getHoodieReplaceCommitMetadata().getPartitionToWriteStats()).keySet();
             this.operationType = 
WriteOperationType.fromValue(this.metadataWrapper.getMetadataFromTimeline().getHoodieReplaceCommitMetadata().getOperationType());
           } else {
-            HoodieRequestedReplaceMetadata requestedReplaceMetadata = 
this.metadataWrapper.getMetadataFromTimeline().getHoodieRequestedReplaceMetadata();
-            this.mutatedFileIds = requestedReplaceMetadata
-                .getClusteringPlan().getInputGroups()
-                .stream()
-                .flatMap(ig -> ig.getSlices().stream())
-                .map(file -> file.getFileId())
-                .collect(Collectors.toSet());
-            this.operationType = WriteOperationType.CLUSTER;
+            // we need to different handling for requested and inflight 
replacecommit because
+            // for requested replacecommit, clustering will generate a plan 
and HoodieRequestedReplaceMetadata will not be empty, but 
insert_overwrite/insert_overwrite_table could have empty content
+            // for inflight replacecommit, clustering will have no content in 
metadata, but insert_overwrite/insert_overwrite_table will have some commit 
metadata
+            if (instant.isRequested()) {
+              HoodieRequestedReplaceMetadata requestedReplaceMetadata = 
this.metadataWrapper.getMetadataFromTimeline().getHoodieRequestedReplaceMetadata();
+              if (requestedReplaceMetadata != null) {
+                this.mutatedFileIds = requestedReplaceMetadata
+                    .getClusteringPlan().getInputGroups()
+                    .stream()
+                    .flatMap(ig -> ig.getSlices().stream())
+                    .map(file -> file.getFileId())
+                    .collect(Collectors.toSet());
+                this.operationType = WriteOperationType.CLUSTER;
+              }
+            } else {

Review comment:
       If the replacecommit from clustering is inflight, should we still read 
the requested HoodieRequestedReplaceMetadata?  Otherwise, we miss those stored 
File IDs.

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/ConcurrentOperation.java
##########
@@ -116,14 +116,27 @@ private void init(HoodieInstant instant) {
                 
this.metadataWrapper.getMetadataFromTimeline().getHoodieReplaceCommitMetadata().getPartitionToWriteStats()).keySet();
             this.operationType = 
WriteOperationType.fromValue(this.metadataWrapper.getMetadataFromTimeline().getHoodieReplaceCommitMetadata().getOperationType());
           } else {
-            HoodieRequestedReplaceMetadata requestedReplaceMetadata = 
this.metadataWrapper.getMetadataFromTimeline().getHoodieRequestedReplaceMetadata();
-            this.mutatedFileIds = requestedReplaceMetadata
-                .getClusteringPlan().getInputGroups()
-                .stream()
-                .flatMap(ig -> ig.getSlices().stream())
-                .map(file -> file.getFileId())
-                .collect(Collectors.toSet());
-            this.operationType = WriteOperationType.CLUSTER;
+            // we need to different handling for requested and inflight 
replacecommit because

Review comment:
       nit: `we need to different handling` -> `we need to have different 
handling`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to