pratyakshsharma commented on code in PR #9699:
URL: https://github.com/apache/hudi/pull/9699#discussion_r1325451523


##########
website/docs/procedures.md:
##########
@@ -1458,6 +1463,47 @@ call show_compaction(table => 'test_hudi_table', limit 
=> 1);
 |-------------------|------------|---------|
 | 20220408153707928 | compaction | 10      |
 
+### run_clean
+
+Run cleaner on a hoodie table.
+
+**Input**
+
+| Parameter Name                                                               
         | Type    | Required | Default Value | Description                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+|---------------------------------------------------------------------------------------|---------|----------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| table                                                                        
         | String  | Y        | None          | Name of table to be cleaned     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+| schedule_in_line                                                             
         | Boolean | N        | true          | Set "true" if you want to 
schedule and run a clean. Set false if you have already scheduled a clean and 
want to run that.                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                  |
+| [clean_policy](/docs/next/configurations#hoodiecleanerpolicy)                
         | String  | N        | None          | 
org.apache.hudi.common.model.HoodieCleaningPolicy: Cleaning policy to be used. 
The cleaner service deletes older file slices files to re-claim space. Long 
running query plans may often refer to older file slices and will break if 
those are cleaned, before the query has had a chance to run. So, it is good to 
make sure that the data is retained for more than the maximum query execution 
time. By default, the cleaning policy is determined based on one of the 
following configs explicitly set by the user (at most one of them can be set; 
otherwise, KEEP_LATEST_COMMITS cleaning policy is used). 
KEEP_LATEST_FILE_VERSIONS: keeps the last N versions of the file slices 
written; used when "hoodie.cleaner.fileversions.retained" is explicitly set 
only. KEEP_LATEST_COMMITS(default): keeps the file slices written by the last N 
commits; used when "hoodie.cleaner.commits.retai
 ned" is explicitly set only. KEEP_LATEST_BY_HOURS: keeps the file slices 
written in the last N hours based on the commit time; used when 
"hoodie.cleaner.hours.retained" is explicitly set only. |
+| [retain_commits](/docs/next/configurations#hoodiecleanercommitsretained)     
         | Int     | N        | None          | Number of commits to retain, 
without cleaning. This will be retained for num_of_commits * 
time_between_commits (scheduled). This also directly translates into how much 
data retention the table supports for incremental queries.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              
                                                                                
                                                                                
                                  |

Review Comment:
   I think this is good to mention here that this property is used when 
KEEP_LATEST_COMMITS is set as the cleaner policy.



##########
website/docs/procedures.md:
##########
@@ -1458,6 +1463,47 @@ call show_compaction(table => 'test_hudi_table', limit 
=> 1);
 |-------------------|------------|---------|
 | 20220408153707928 | compaction | 10      |
 
+### run_clean
+
+Run cleaner on a hoodie table.
+
+**Input**
+
+| Parameter Name                                                               
         | Type    | Required | Default Value | Description                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+|---------------------------------------------------------------------------------------|---------|----------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| table                                                                        
         | String  | Y        | None          | Name of table to be cleaned     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+| schedule_in_line                                                             
         | Boolean | N        | true          | Set "true" if you want to 
schedule and run a clean. Set false if you have already scheduled a clean and 
want to run that.                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                  |
+| [clean_policy](/docs/next/configurations#hoodiecleanerpolicy)                
         | String  | N        | None          | 
org.apache.hudi.common.model.HoodieCleaningPolicy: Cleaning policy to be used. 
The cleaner service deletes older file slices files to re-claim space. Long 
running query plans may often refer to older file slices and will break if 
those are cleaned, before the query has had a chance to run. So, it is good to 
make sure that the data is retained for more than the maximum query execution 
time. By default, the cleaning policy is determined based on one of the 
following configs explicitly set by the user (at most one of them can be set; 
otherwise, KEEP_LATEST_COMMITS cleaning policy is used). 
KEEP_LATEST_FILE_VERSIONS: keeps the last N versions of the file slices 
written; used when "hoodie.cleaner.fileversions.retained" is explicitly set 
only. KEEP_LATEST_COMMITS(default): keeps the file slices written by the last N 
commits; used when "hoodie.cleaner.commits.retai
 ned" is explicitly set only. KEEP_LATEST_BY_HOURS: keeps the file slices 
written in the last N hours based on the commit time; used when 
"hoodie.cleaner.hours.retained" is explicitly set only. |
+| [retain_commits](/docs/next/configurations#hoodiecleanercommitsretained)     
         | Int     | N        | None          | Number of commits to retain, 
without cleaning. This will be retained for num_of_commits * 
time_between_commits (scheduled). This also directly translates into how much 
data retention the table supports for incremental queries.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              
                                                                                
                                                                                
                                  |
+| [hours_retained](/docs/next/configurations#hoodiecleanerhoursretained)       
         | Int     | N        | None          | Number of hours for which 
commits need to be retained. This config provides a more flexible option 
ascompared to number of commits retained for cleaning service. Setting this 
property ensures all the files, but the latest in a file group, corresponding 
to commits with commit times older than the configured number of hours to be 
retained are cleaned.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            
                                                                                
                                                                                
                                  |

Review Comment:
   nit: ascompared -> as compared



##########
website/docs/procedures.md:
##########
@@ -1458,6 +1463,47 @@ call show_compaction(table => 'test_hudi_table', limit 
=> 1);
 |-------------------|------------|---------|
 | 20220408153707928 | compaction | 10      |
 
+### run_clean
+
+Run cleaner on a hoodie table.
+
+**Input**
+
+| Parameter Name                                                               
         | Type    | Required | Default Value | Description                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+|---------------------------------------------------------------------------------------|---------|----------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| table                                                                        
         | String  | Y        | None          | Name of table to be cleaned     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                  |
+| schedule_in_line                                                             
         | Boolean | N        | true          | Set "true" if you want to 
schedule and run a clean. Set false if you have already scheduled a clean and 
want to run that.                                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                  |
+| [clean_policy](/docs/next/configurations#hoodiecleanerpolicy)                
         | String  | N        | None          | 
org.apache.hudi.common.model.HoodieCleaningPolicy: Cleaning policy to be used. 
The cleaner service deletes older file slices files to re-claim space. Long 
running query plans may often refer to older file slices and will break if 
those are cleaned, before the query has had a chance to run. So, it is good to 
make sure that the data is retained for more than the maximum query execution 
time. By default, the cleaning policy is determined based on one of the 
following configs explicitly set by the user (at most one of them can be set; 
otherwise, KEEP_LATEST_COMMITS cleaning policy is used). 
KEEP_LATEST_FILE_VERSIONS: keeps the last N versions of the file slices 
written; used when "hoodie.cleaner.fileversions.retained" is explicitly set 
only. KEEP_LATEST_COMMITS(default): keeps the file slices written by the last N 
commits; used when "hoodie.cleaner.commits.retai
 ned" is explicitly set only. KEEP_LATEST_BY_HOURS: keeps the file slices 
written in the last N hours based on the commit time; used when 
"hoodie.cleaner.hours.retained" is explicitly set only. |
+| [retain_commits](/docs/next/configurations#hoodiecleanercommitsretained)     
         | Int     | N        | None          | Number of commits to retain, 
without cleaning. This will be retained for num_of_commits * 
time_between_commits (scheduled). This also directly translates into how much 
data retention the table supports for incremental queries.                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              
                                                                                
                                                                                
                                  |
+| [hours_retained](/docs/next/configurations#hoodiecleanerhoursretained)       
         | Int     | N        | None          | Number of hours for which 
commits need to be retained. This config provides a more flexible option 
ascompared to number of commits retained for cleaning service. Setting this 
property ensures all the files, but the latest in a file group, corresponding 
to commits with commit times older than the configured number of hours to be 
retained are cleaned.                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            
                                                                                
                                                                                
                                  |

Review Comment:
   Again lets mention this is to be used with KEEP_LATEST_BY_HOURS policy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to