GitHub user aeroyorch added a comment to the discussion: Add CronJob which 
cleans up the Airflow database in airflow helm chart

I’m also interested in implementing this capability of cleaning up the Airflow 
metadata DB through the Helm chart, and I’d like to propose an initial approach 
for it. Here’s a first draft of the CronJob configuration:

```yaml
dbCleanup:
  enabled: false
  schedule: "0 0 * * *"
  # Archive purged records into *_archive tables (set to true to skip archiving)
  skipArchive: false
  # Table names to perform maintenance on. Supported values listed at:
  # 
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#clean
  tables: []
  # Maximum number of rows to delete or archive in a single transaction
  batchSize: ~
  # The duration before which data should be purged relative to the current 
time.
  # Must follow Go duration format (see https://pkg.go.dev/time#ParseDuration).
  cleanBefore: "720h" # 30 days
  # Make logging output more verbose
  verbose: false
```


Additionally, though this might deserve a separate issue, I’d also find it very 
valuable from a production perspective to have an option to export the archived 
data to a distributed backend such as S3, for example:

```yaml
dbCleanup:
  ...
  exportArchivedTo:
    enabled: false
    # compress to gzip format
    compress: true
    backend:
      s3:
        bucketName: my-airflow-archive
        region: us-west-2
        accessKeyId: YOUR_ACCESS_KEY_ID
        secretAccessKey: YOUR_SECRET_ACCESS_KEY
        # or accessKeySecretName: ...
      # gcs:
      # ...
```

I’m not sure yet whether it would make more sense to handle this export through 
a small Python snippet (e.g. using the ```boto3``` library) or by extending the 
Airflow CLI itself with something like ```airflow db export-archived 
--export-to s3://...```.

GitHub link: 
https://github.com/apache/airflow/discussions/32419#discussioncomment-15045581

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to