potiuk commented on a change in pull request #11079:
URL: https://github.com/apache/airflow/pull/11079#discussion_r492667994



##########
File path: dev/remove_artifacts.sh
##########
@@ -0,0 +1,84 @@
+#!/usr/bin/env bash

Review comment:
       I know we could have written it in python/octokit, but I've found the 
bash script that worked out of the box. Improved it slightly and refactored 
according to Google Shell Guide and it seems to do it''s job (running it now in 
the background).

##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
 name: 'Delete old artifacts'
 on:
   schedule:
-    - cron: '0 * * * *' # every hour
+    - cron: '27 */6 * * *' # run every 6 hours

Review comment:
       There is a bit of a problem - we cannot rate-limit the action and we 
occasionally exhausted the limit now.
   I did some back-of-the-envelope calculations:
   
   The limit is  5000 API calls / hr. So if we have > 3000 artifacts to delete, 
we are getting dangerously close to be able to exhaust our API calls within 
single run.
   
   We have (tops) ~ 200 builds a day  with (tops) ~ 50 artifacts each (assume 
intensive period and increasing number of artifacts)  > 10.000 artifacts to 
delete a day. Running it 4 times/day is ~ 2.500 artifacts to delete for each 
run.
   

##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
 name: 'Delete old artifacts'
 on:
   schedule:
-    - cron: '0 * * * *' # every hour
+    - cron: '27 */6 * * *' # run every 6 hours

Review comment:
       BTW. I am running the deletion now using the script with ~ 1000 
deletions/hour ( I also hit rate limit with my personal token). With those 
assumptions and 90 day retention period, it will take ~ 10 days to delete all 
old artifacts (assuming we have 30% of the 10.000/day = 3000/day * 90 /24 ~ 10  
:). 

##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
 name: 'Delete old artifacts'
 on:
   schedule:
-    - cron: '0 * * * *' # every hour
+    - cron: '27 */6 * * *' # run every 6 hours

Review comment:
       BTW. I am running the deletion now using the script with ~ 1000 
deletions/hour ( I also hit rate limit with my personal token). With those 
assumptions and 90 day retention period, it will take ~ 10 days to delete all 
old artifacts (assuming we have 30% of the 10.000/day = 3000/day * 90 /24 ~ 10 
days  :). 

##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
 name: 'Delete old artifacts'
 on:
   schedule:
-    - cron: '0 * * * *' # every hour
+    - cron: '27 */6 * * *' # run every 6 hours

Review comment:
       BTW. I think my calculations were a bit too pessimistics. Seems that 24 
hours ewere enough to delete the artifacts. The job started to succeed and it 
takes ~ 5 minutes to run when it runs. Also my script is now often finishing 
after some 30-50 artifacts so seems we got to a "clean state" much faster than 
I thought. 
   
   

##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
 name: 'Delete old artifacts'
 on:
   schedule:
-    - cron: '0 * * * *' # every hour
+    - cron: '27 */6 * * *' # run every 6 hours

Review comment:
       Yep. Now when I run "delete" and no jobs finish in the meantime there 
are no artifacts to delete :) @dimberman -> artifacts are cleaned up now..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to