danny0405 commented on code in PR #18016:
URL: https://github.com/apache/hudi/pull/18016#discussion_r2748958073


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -490,6 +491,18 @@ public HashSet<String> getWritePartitionPaths() {
     return new HashSet<>(partitionToWriteStats.keySet());
   }
 
+  public Set<String> getWritePartitionPathsWithExistingFileGroupsModified() {
+    return getPartitionToWriteStats()
+        .entrySet()
+        .stream()
+        .filter(partitionAndWriteStats -> partitionAndWriteStats
+            .getValue()
+            .stream()
+            .anyMatch(writeStat -> 
!Option.ofNullable(writeStat.getPrevCommit()).orElse("null").equalsIgnoreCase("null")))

Review Comment:
   Filtering out the delta write stats will resolve `#2` and for `#1` we 
probably needs to check whether the commit metadata comes from a compaction, 
for a compaction, all the partitons it writes into should be included.
   
   Wondering if it makes sense to check the commit type first for different 
table type:
   
   1. for mor: only compaction and replace commit  needs cleaning;
   2. for cow: the changes in the PR should work;
   3. drop partiiton works for both of the table type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to