kbuci commented on code in PR #18016:
URL: https://github.com/apache/hudi/pull/18016#discussion_r2748516730
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieReplaceCommitMetadata.java:
##########
@@ -82,6 +85,12 @@ public String toJsonString() throws IOException {
return
JsonUtils.getObjectMapper().writerWithDefaultPrettyPrinter().writeValueAsString(this);
}
+ @Override
+ public Set<String> getWritePartitionPathsWithExistingFileGroupsModified() {
+ return
Stream.concat(super.getWritePartitionPathsWithExistingFileGroupsModified().stream(),
Review Comment:
In practice what you're suggesting should work for most workloads, but my
concern is that
- I wanted to keep existing behavior in
https://github.com/apache/hudi/pull/18016/changes#diff-f3f9696ca040bd9f581a1122b1c5cf0409b3ba4525672fbf3328e9f2c40eec66L240
- I felt it was safer to keep the logic of considering both replaced
partitions and those that appear in write stats
- As far as I am aware, it's technically legal for the write stats in a
replacecommit to have a `non-null` prev commit. So its better to be on the
safer side and check that. In fact we use this behavior internally right now,
we have a custom clustering strategy we use (for pruning columns) that updates
each file but doesn't replace, so for that strategy we update the write stats
to indicate the `prevCommit` of each pruned file group
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]