Another option is that you ( could make ONLY for those two tables - partition-friendly. And do not partition anything else. I think that **could** be possible - both have datetime fields that could be used as partition keys - you would have to assess if you can do it as your "custom" solution or whether it would require some changes to airflow models. But I can't see foreign key problems if ONLY those two tables are partitioned, so likely you could do it yourself in your DB.
In this case - maybe "let's solve those tables that are problematic" is easier to do than "let's apply partitioning to everything". On Mon, Dec 29, 2025 at 7:31 PM Jarek Potiuk <[email protected]> wrote: > Purely theoretically, you could change the log and job tables to be > unlogged - and thus avoid WAL for them. > > The drawback of this: > > * the data in those tables will be lost if you pull the plug or kill -9 > the primary postgres server > * the tables are not available (at all) in replicas - so in case of a > fallback, you would have to have a manual data import/export for those > tables on fail-over, rather than rely on replicas being "ready to fallback > immediately". Or accept data loss. > * the data from those tables will not be present in backups > > I am not 100% sure, but I believe loss of data in both tables is not > really catastrophic for Airflow, so maybe it's acceptable risk (but likely > you should do a disaster-recovery test to see what happens and how to > recover in case, indeed, someone pulls the plug on your postgres server. > > J, > > > > > On Mon, Dec 29, 2025 at 7:16 PM Natanel <[email protected]> wrote: > >> Yes, the problem is the manual deletions, we have tried it, it resulted in >> the same exact issue, as the scheduled db procedures which clean up the >> rows marked as deleted actually get deleted, and so it takes up storage, >> yet it does not solve the WAL problem, the problematic table is actually >> not task_instance, it is relatively small, the log and job tables are the >> biggest tables (the problem with them is the primary key change required), >> by a multiple of 10 (or more, cluster dependant). >> >> The smaller batches might solve the issue, however, it seems to just delay >> the problem a little rather than solve it, as deleting data with a delete >> query (especially a lot of data) is not a very light operation, and so I >> think that this is the main issue. >> >> It would be nice if we could use partitions instead, as it is a lighter >> operation, and does not require us to maintain a query and manage our db, >> I >> have thought about changing the models, most of the changes are relatively >> simple, for some it is just removing the foreign key and relying on ORM >> level constraints, for others, it requires adding a pre query to have the >> same constrains but I do not like that idea, maybe there is another way to >> make airflow "partition-friendly"? >> >> I can't think of a nice way to do so, maybe it does not exist, as the db >> clean is as simple as a delete query gets, yet when there is a lot of >> data, >> it is all duplicated in WALs. >> >> On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev < >> [email protected]> >> wrote: >> >> > Have you looked at doing manual deletions? I.e. writing your own sql? >> > >> > The db clean command is probably not "optimal" for all scenarios. >> > >> > So for example, if the main problem table for you is task_instance, you >> > could periodically delete TI records in smaller batches using some >> > appropriate index (whether it exists now or you add it). Then maybe you >> > would not stress the db as hard. >> > >> > Airflow isn't designed to use partitions so, you may not get good >> results >> > with that approach. >> > >> > >> > >> > On Mon, Dec 29, 2025 at 7:32 AM Natanel <[email protected]> >> wrote: >> > >> > > Hello everyone, after having issues with the 'airflow db clean' >> command, >> > > where due to the amount of dags and tasks that are running every day >> in >> > our >> > > deployments, we get a lot of new data every day, which is stored in >> the >> > > database, and when we delete the data, due to the way PGSQL works, the >> > > WAL's get replicated to both the archive storage and the main data >> > storage >> > > of the db instance, which in turn, causes a significant jump in cpu >> > usage, >> > > ram usage and disk usage, whenever we run the command, which causes >> all >> > > kinds of issues, we even had it once fill up the db storage, and >> causing >> > > the database to be unresponsive, forcing us to move to our backup >> > database, >> > > after we haven't ran the command for a few months due to human error. >> > > >> > > As of now, I know that this is the accepted and widely used way of >> > managing >> > > the airflow database's size, however, we noticed that it may cause >> issues >> > > in certain cases, just like in our case, where if the db has not been >> > > cleaned up for a while, cleaning it can be problematic. >> > > >> > > We decided to try and partition the table, and use pgsql's built in >> > > retention of partitions, which does not issue a DELETE query, and is >> > > lighter and faster, while being simpler to use, however, we have >> > > encountered issues due to having Foreign Key constraints in some >> tables, >> > > having to duplicate such keys and other than forcing code changes (as >> the >> > > foreign key must include the partitioned key, as the partitioned key >> must >> > > be part of the primary key), while also having the issue of sqlalchemy >> > > breaking once we change the primary key, with the addition of the >> > > constraints on the primary key breaking. >> > > >> > > And in Mysql, due to the foreign keys, it is not possible to partition >> > > tables which include them, as it is not supported yet (according to >> this >> > > < >> > > >> > >> https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html >> > > > >> > > ). >> > > >> > > Has anyone else tried to use the databases built in partition >> retention >> > > system instead of the 'airflow db clean' command? >> > > >> > > Thanks, Natanel. >> > > >> > >> >
