GitHub user aeroyorch added a comment to the discussion: Add CronJob which cleans up the Airflow database in airflow helm chart
I’m also interested in implementing this capability of cleaning up the Airflow metadata DB through the Helm chart, and I’d like to propose an initial approach for it. Here’s a first draft of the CronJob configuration: ```yaml dbCleanup: enabled: false schedule: "0 0 * * *" # Archive purged records into *_archive tables (set to true to skip archiving) skipArchive: false # Table names to perform maintenance on. Supported values listed at: # https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#clean tables: [] # Maximum number of rows to delete or archive in a single transaction batchSize: ~ # The duration before which data should be purged relative to the current time. # Must follow Go duration format (see https://pkg.go.dev/time#ParseDuration). cleanBefore: "720h" # 30 days # Make logging output more verbose verbose: false ``` Additionally, though this might deserve a separate issue, I’d also find it very valuable from a production perspective to have an option to export the archived data to a distributed backend such as S3, for example: ```yaml dbCleanup: ... exportArchivedTo: enabled: false # compress to gzip format compress: true backend: s3: bucketName: my-airflow-archive region: us-west-2 accessKeyId: YOUR_ACCESS_KEY_ID secretAccessKey: YOUR_SECRET_ACCESS_KEY # or accessKeySecretName: ... # gcs: # ... ``` I’m not sure yet whether it would make more sense to handle this export through a small Python snippet (e.g. using the ```boto3``` library) or by extending the Airflow CLI itself with something like ```airflow db export-archived --export-to s3://...```. GitHub link: https://github.com/apache/airflow/discussions/32419#discussioncomment-15045581 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
