kbuci commented on code in PR #18016:
URL: https://github.com/apache/hudi/pull/18016#discussion_r2898526076
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -499,4 +500,18 @@ public Pair<Option<Long>, Option<Long>>
getMinAndMaxEventTime() {
public HashSet<String> getWritePartitionPaths() {
return new HashSet<>(partitionToWriteStats.keySet());
}
+
+ public Set<String> getWritePartitionPathsWithUpdatedFileGroups() {
+ return getPartitionToWriteStats()
+ .entrySet()
+ .stream()
+ .filter(partitionAndWriteStats -> partitionAndWriteStats
+ .getValue()
+ .stream()
+ .anyMatch(writeStat ->
!Option.ofNullable(writeStat.getPrevCommit())
Review Comment:
Hmm I think we would also have to check numDeletes right? Like maybe some
check like
`numWrites != numInserts || numDeletes > 0 || numUpdateWrites > 0`
On paper that should cover [small file handling]/[deletes]/[updates] I think?
But ideally it would be nice to either use this or `prevBaseFile` since that
code would be easier to maintain. And even when writing the above suggestion, I
was initially confused and mixed up numUpdateWrites and numUpdates by mistake.
Let me know your thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]