pratyakshsharma commented on a change in pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#discussion_r736235448



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
##########
@@ -1240,6 +1244,154 @@ public void testKeepLatestCommits(boolean 
simulateFailureRetry, boolean enableIn
     assertTrue(testTable.baseFileExists(p0, "00000000000005", file3P0C2));
   }
 
+  /**
+   * Test cleaning policy based on number of hours retained policy. This test 
case covers the case when files will not be cleaned.
+   */
+  @ParameterizedTest
+  @MethodSource("argumentsForTestKeepLatestCommits")
+  public void testKeepXHoursNoCleaning(boolean simulateFailureRetry, boolean 
enableIncrementalClean, boolean enableBootstrapSourceClean) throws Exception {

Review comment:
       I think it is not possible to reuse existing tests. That is because of 
the fundamental difference in the way the 2 policies work - KEEP_LATEST_COMMITS 
and KEEP_LATEST_BY_HOURS. Former would not clean anything till 2 commits if 
configured with 2 commits, however latter will start cleaning as and when the 
number of hours are exceeded. I guess it is better to keep them separate so 
that they look clean. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -330,6 +336,74 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable<T, I, K, O> hoodieT
     }
     return deletePaths;
   }
+
+  /**
+   * This method finds the files to be cleaned based on the number of hours. 
If {@code config.getCleanerHoursRetained()} is set to 5,
+   * all the files with commit time earlier than 5 hours will be removed. Also 
the latest file for any file group is retained.
+   * This policy gives much more flexibility to users for retaining data for 
running incremental queries as compared to
+   * KEEP_LATEST_COMMITS cleaning policy. The default number of hours is 5.
+   * @param partitionPath partition path to check
+   * @return list of files to clean
+   */
+  private List<CleanFileInfo> getFilesToCleanKeepingLatestHours(String 
partitionPath) {

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to