Yes, the problem is the manual deletions, we have tried it, it resulted in the same exact issue, as the scheduled db procedures which clean up the rows marked as deleted actually get deleted, and so it takes up storage, yet it does not solve the WAL problem, the problematic table is actually not task_instance, it is relatively small, the log and job tables are the biggest tables (the problem with them is the primary key change required), by a multiple of 10 (or more, cluster dependant).
The smaller batches might solve the issue, however, it seems to just delay the problem a little rather than solve it, as deleting data with a delete query (especially a lot of data) is not a very light operation, and so I think that this is the main issue. It would be nice if we could use partitions instead, as it is a lighter operation, and does not require us to maintain a query and manage our db, I have thought about changing the models, most of the changes are relatively simple, for some it is just removing the foreign key and relying on ORM level constraints, for others, it requires adding a pre query to have the same constrains but I do not like that idea, maybe there is another way to make airflow "partition-friendly"? I can't think of a nice way to do so, maybe it does not exist, as the db clean is as simple as a delete query gets, yet when there is a lot of data, it is all duplicated in WALs. On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev <[email protected]> wrote: > Have you looked at doing manual deletions? I.e. writing your own sql? > > The db clean command is probably not "optimal" for all scenarios. > > So for example, if the main problem table for you is task_instance, you > could periodically delete TI records in smaller batches using some > appropriate index (whether it exists now or you add it). Then maybe you > would not stress the db as hard. > > Airflow isn't designed to use partitions so, you may not get good results > with that approach. > > > > On Mon, Dec 29, 2025 at 7:32 AM Natanel <[email protected]> wrote: > > > Hello everyone, after having issues with the 'airflow db clean' command, > > where due to the amount of dags and tasks that are running every day in > our > > deployments, we get a lot of new data every day, which is stored in the > > database, and when we delete the data, due to the way PGSQL works, the > > WAL's get replicated to both the archive storage and the main data > storage > > of the db instance, which in turn, causes a significant jump in cpu > usage, > > ram usage and disk usage, whenever we run the command, which causes all > > kinds of issues, we even had it once fill up the db storage, and causing > > the database to be unresponsive, forcing us to move to our backup > database, > > after we haven't ran the command for a few months due to human error. > > > > As of now, I know that this is the accepted and widely used way of > managing > > the airflow database's size, however, we noticed that it may cause issues > > in certain cases, just like in our case, where if the db has not been > > cleaned up for a while, cleaning it can be problematic. > > > > We decided to try and partition the table, and use pgsql's built in > > retention of partitions, which does not issue a DELETE query, and is > > lighter and faster, while being simpler to use, however, we have > > encountered issues due to having Foreign Key constraints in some tables, > > having to duplicate such keys and other than forcing code changes (as the > > foreign key must include the partitioned key, as the partitioned key must > > be part of the primary key), while also having the issue of sqlalchemy > > breaking once we change the primary key, with the addition of the > > constraints on the primary key breaking. > > > > And in Mysql, due to the foreign keys, it is not possible to partition > > tables which include them, as it is not supported yet (according to this > > < > > > https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html > > > > > ). > > > > Has anyone else tried to use the databases built in partition retention > > system instead of the 'airflow db clean' command? > > > > Thanks, Natanel. > > >
