Have you looked at doing manual deletions?  I.e. writing your own sql?

The db clean command is probably not "optimal" for all scenarios.

So for example, if the main problem table for you is task_instance, you
could periodically delete TI records in smaller batches using some
appropriate index (whether it exists now or you add it).  Then maybe you
would not stress the db as hard.

Airflow isn't designed to use partitions so, you may not get good results
with that approach.



On Mon, Dec 29, 2025 at 7:32 AM Natanel <[email protected]> wrote:

> Hello everyone, after having issues with the 'airflow db clean' command,
> where due to the amount of dags and tasks that are running every day in our
> deployments, we get a lot of new data every day, which is stored in the
> database, and when we delete the data, due to the way PGSQL works, the
> WAL's get replicated to both the archive storage and the main data storage
> of the db instance, which in turn, causes a significant jump in cpu usage,
> ram usage and disk usage, whenever we run the command, which causes all
> kinds of issues, we even had it once fill up the db storage, and causing
> the database to be unresponsive, forcing us to move to our backup database,
> after we haven't ran the command for a few months due to human error.
>
> As of now, I know that this is the accepted and widely used way of managing
> the airflow database's size, however, we noticed that it may cause issues
> in certain cases, just like in our case, where if the db has not been
> cleaned up for a while, cleaning it can be problematic.
>
> We decided to try and partition the table, and use pgsql's built in
> retention of partitions, which does not issue a DELETE query, and is
> lighter and faster, while being simpler to use, however, we have
> encountered issues due to having Foreign Key constraints in some tables,
> having to duplicate such keys and other than forcing code changes (as the
> foreign key must include the partitioned key, as the partitioned key must
> be part of the primary key), while also having the issue of sqlalchemy
> breaking once we change the primary key, with the addition of the
> constraints on the primary key breaking.
>
> And in Mysql, due to the foreign keys, it is not possible to partition
> tables which include them, as it is not supported yet (according to this
> <
> https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html
> >
> ).
>
> Has anyone else tried to use the databases built in partition retention
> system instead of the 'airflow db clean' command?
>
> Thanks, Natanel.
>

Reply via email to