prashantwason commented on a change in pull request #2064:
URL: https://github.com/apache/hudi/pull/2064#discussion_r491219275
##########
File path:
hudi-client/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -180,14 +181,14 @@ public CleanPlanner(HoodieTable<T> hoodieTable,
HoodieWriteConfig config) {
}
/**
- * Scan and list all paritions for cleaning.
+ * Scan and list all partitions for cleaning.
* @return all partitions paths for the dataset.
* @throws IOException
*/
private List<String> getPartitionPathsForFullCleaning() throws IOException {
// Go to brute force mode of scanning all partitions
- return FSUtils.getAllPartitionPaths(hoodieTable.getMetaClient().getFs(),
hoodieTable.getMetaClient().getBasePath(),
- config.shouldAssumeDatePartitioning());
+ return
HoodieMetadata.getAllPartitionPaths(hoodieTable.getMetaClient().getFs(),
Review comment:
With flags for various operations there is greater chance of eventual
inconsistency - async operations may have created/deleted files which are
unknown to metadata yet.
If for certain operations we really need to skip metadata, it will be
cleaner to change the API to reflect that. Example:
HoodieMetadata.getAllPartitionPaths(...., boolean shouldValidate);
When shouldValidate is true, metadata validation if forced leading to file
listing being used to return results.
This way we force all listing operation to use single code path which can be
optimized later on.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]