potiuk commented on a change in pull request #11079:
URL: https://github.com/apache/airflow/pull/11079#discussion_r492667994
##########
File path: dev/remove_artifacts.sh
##########
@@ -0,0 +1,84 @@
+#!/usr/bin/env bash
Review comment:
I know we could have written it in python/octokit, but I've found the
bash script that worked out of the box. Improved it slightly and refactored
according to Google Shell Guide and it seems to do it''s job (running it now in
the background).
##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
name: 'Delete old artifacts'
on:
schedule:
- - cron: '0 * * * *' # every hour
+ - cron: '27 */6 * * *' # run every 6 hours
Review comment:
There is a bit of a problem - we cannot rate-limit the action and we
occasionally exhausted the limit now.
I did some back-of-the-envelope calculations:
The limit is 5000 API calls / hr. So if we have > 3000 artifacts to delete,
we are getting dangerously close to be able to exhaust our API calls within
single run.
We have (tops) ~ 200 builds a day with (tops) ~ 50 artifacts each (assume
intensive period and increasing number of artifacts) > 10.000 artifacts to
delete a day. Running it 4 times/day is ~ 2.500 artifacts to delete for each
run.
##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
name: 'Delete old artifacts'
on:
schedule:
- - cron: '0 * * * *' # every hour
+ - cron: '27 */6 * * *' # run every 6 hours
Review comment:
BTW. I am running the deletion now using the script with ~ 1000
deletions/hour ( I also hit rate limit with my personal token). With those
assumptions and 90 day retention period, it will take ~ 10 days to delete all
old artifacts (assuming we have 30% of the 10.000/day = 3000/day * 90 /24 ~ 10
:).
##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
name: 'Delete old artifacts'
on:
schedule:
- - cron: '0 * * * *' # every hour
+ - cron: '27 */6 * * *' # run every 6 hours
Review comment:
BTW. I am running the deletion now using the script with ~ 1000
deletions/hour ( I also hit rate limit with my personal token). With those
assumptions and 90 day retention period, it will take ~ 10 days to delete all
old artifacts (assuming we have 30% of the 10.000/day = 3000/day * 90 /24 ~ 10
days :).
##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
name: 'Delete old artifacts'
on:
schedule:
- - cron: '0 * * * *' # every hour
+ - cron: '27 */6 * * *' # run every 6 hours
Review comment:
BTW. I think my calculations were a bit too pessimistics. Seems that 24
hours ewere enough to delete the artifacts. The job started to succeed and it
takes ~ 5 minutes to run when it runs. Also my script is now often finishing
after some 30-50 artifacts so seems we got to a "clean state" much faster than
I thought.
##########
File path: .github/workflows/delete_old_artifacts.yml
##########
@@ -1,11 +1,12 @@
name: 'Delete old artifacts'
on:
schedule:
- - cron: '0 * * * *' # every hour
+ - cron: '27 */6 * * *' # run every 6 hours
Review comment:
Yep. Now when I run "delete" and no jobs finish in the meantime there
are no artifacts to delete :) @dimberman -> artifacts are cleaned up now..
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]