xushiyan commented on a change in pull request #4489:
URL: https://github.com/apache/hudi/pull/4489#discussion_r825286337
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java
##########
@@ -18,6 +18,8 @@
package org.apache.hudi.table;
+import org.apache.avro.Schema;
Review comment:
there is no other code change in this file. should avoid re-ordering
imports here
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -472,4 +499,21 @@ private boolean
isFileSliceNeededForPendingCompaction(FileSlice fileSlice) {
private boolean isFileGroupInPendingCompaction(HoodieFileGroup fg) {
return fgIdToPendingCompactionOperations.containsKey(fg.getFileGroupId());
}
+
+ public boolean isDeletePartitionOperation(Option<HoodieInstant>
instantToRetain) {
Review comment:
it actually looks better to be in some timeline utils class instead of
in `CleanPlanner`
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -206,7 +212,13 @@ public CleanPlanner(HoodieEngineContext context,
HoodieTable<T, I, K, O> hoodieT
*/
private List<String> getPartitionPathsForFullCleaning() {
// Go to brute force mode of scanning all partitions
- return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(),
config.getBasePath());
+ try {
+ FileSystemBackedTableMetadata fsBackedTableMetadata = new
FileSystemBackedTableMetadata(context,
+ context.getHadoopConf(), config.getBasePath(),
config.shouldAssumeDatePartitioning());
+ return fsBackedTableMetadata.getAllPartitionPaths();
+ } catch (IOException e) {
+ return Collections.emptyList();
Review comment:
can you clarify why we have to use the concrete implementation
`FileSystemBackedTableMetadata` here? the existing code uses its interface
`HoodieTableMetadata` as an abstraction.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -159,12 +163,14 @@ public CleanPlanner(HoodieEngineContext context,
HoodieTable<T, I, K, O> hoodieT
.deserializeHoodieCleanMetadata(hoodieTable.getActiveTimeline().getInstantDetails(lastClean.get()).get());
if ((cleanMetadata.getEarliestCommitToRetain() != null)
&& (cleanMetadata.getEarliestCommitToRetain().length() > 0))
{
- return getPartitionPathsForIncrementalCleaning(cleanMetadata,
instantToRetain);
+ return getPartitionPathsForIncrementalCleaning(cleanMetadata,
instantToRetain).stream()
+ .distinct().collect(Collectors.toList());
}
}
}
}
- return getPartitionPathsForFullCleaning();
+
+ return
getPartitionPathsForFullCleaning().stream().distinct().collect(Collectors.toList());
Review comment:
why we need the deduplication here?
`getPartitionPathsForIncrementalCleaning` already calls `distinct()` plus for
both methods, it shouldn't be the caller's responsibility to deduplicate; the
APIs should take care of it internally
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -157,10 +157,13 @@ public static void deleteMetadataTable(String basePath,
HoodieEngineContext cont
String instantTime) {
List<HoodieRecord> records = new
ArrayList<>(commitMetadata.getPartitionToWriteStats().size());
- // Add record bearing partitions list
- ArrayList<String> partitionsList = new
ArrayList<>(commitMetadata.getPartitionToWriteStats().keySet());
+ // Add record bearing added partitions list
+ ArrayList<String> partitionsAdded = new
ArrayList<>(commitMetadata.getPartitionToWriteStats().keySet());
+
+ // Add record bearing deleted partitions list
+ ArrayList<String> partitionsDeleted = getPartitionsDeleted(commitMetadata);
Review comment:
should just declare as `List<String>`
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
##########
@@ -542,8 +542,6 @@ private void transitionState(HoodieInstant fromInstant,
HoodieInstant toInstant,
} else {
// Ensures old state exists in timeline
LOG.info("Checking for file exists ?" + new
Path(metaClient.getMetaPath(), fromInstant.getFileName()));
- ValidationUtils.checkArgument(metaClient.getFs().exists(new
Path(metaClient.getMetaPath(),
- fromInstant.getFileName())));
Review comment:
why this not needed any more?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -472,4 +499,21 @@ private boolean
isFileSliceNeededForPendingCompaction(FileSlice fileSlice) {
private boolean isFileGroupInPendingCompaction(HoodieFileGroup fg) {
return fgIdToPendingCompactionOperations.containsKey(fg.getFileGroupId());
}
+
+ public boolean isDeletePartitionOperation(Option<HoodieInstant>
instantToRetain) {
Review comment:
can we make it private if not intend to use else where? also prefer to
have a static helper for this, instead of couple it with CleanPlanner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]