kbuci commented on code in PR #18016:
URL: https://github.com/apache/hudi/pull/18016#discussion_r2748516730


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieReplaceCommitMetadata.java:
##########
@@ -82,6 +85,12 @@ public String toJsonString() throws IOException {
     return 
JsonUtils.getObjectMapper().writerWithDefaultPrettyPrinter().writeValueAsString(this);
   }
 
+  @Override
+  public Set<String> getWritePartitionPathsWithExistingFileGroupsModified() {
+    return 
Stream.concat(super.getWritePartitionPathsWithExistingFileGroupsModified().stream(),

Review Comment:
   In practice what you're suggesting should work for most workloads, but my 
concern is that
   - I wanted to keep existing behavior in 
https://github.com/apache/hudi/pull/18016/changes#diff-f3f9696ca040bd9f581a1122b1c5cf0409b3ba4525672fbf3328e9f2c40eec66L240
  - I felt it was safer to keep the logic of considering both replaced 
partitions and those that appear in write stats 
   - As far as I am aware, it's technically legal for the write stats in a 
replacecommit to have a `non-null` prev commit. So its better to be on the 
safer side and check that. In fact we use this behavior internally right now, 
we have a custom clustering strategy we use (for pruning columns) that updates 
each file but doesn't replace, so for that strategy we update the write stats 
to indicate the `prevCommit` of each pruned file group



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to