pratyakshsharma commented on code in PR #6662:
URL: https://github.com/apache/hudi/pull/6662#discussion_r972062851
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -83,18 +87,24 @@ public boolean isBootstrap() {
return metaClient.getTableConfig().getBootstrapBasePath().isPresent();
}
- public boolean isDropPartition() {
+ /**
+ * Get the set of dropped partitions based on the latest commit metadata.
+ * Returns empty set if the latest commit was not due to DELETE_PARTITION
operation.
+ */
+ public Set<String> getDroppedPartitions() {
try {
- Option<HoodieCommitMetadata> hoodieCommitMetadata =
HoodieTableMetadataUtil.getLatestCommitMetadata(metaClient);
+ Option<HoodieCommitMetadata> hoodieCommitMetadata =
getLatestCommitMetadata(metaClient);
Review Comment:
This is still a problem I believe. Consider the scenario where 3 commits
happen (without syncing to metastore) in order with action given below -
1. upsert
2. drop_partition
3. drop_partition
We will miss the partitions dropped in commit 2 if we only see the latest
commit metadata here. I guess we should check all the commit metadata since the
last sync time with metastore and then get the dropped partitions.
Also it will be good to add a test case simulating this scenario so this
remains intact in future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]