Another option is that you ( could make ONLY for those two tables -
partition-friendly.  And do not partition anything else.  I think that
**could** be possible - both have datetime fields that could be used as
partition keys - you would have to assess if you can do it as your "custom"
solution or whether it would require some changes to airflow models. But I
can't see foreign key problems if ONLY those two tables are partitioned, so
likely you could do it yourself in your DB.

In this case - maybe "let's solve those tables that are problematic" is
easier to do than "let's apply partitioning to everything".

On Mon, Dec 29, 2025 at 7:31 PM Jarek Potiuk <[email protected]> wrote:

> Purely theoretically, you could change the log and job tables to be
> unlogged - and thus avoid WAL for them.
>
> The drawback of this:
>
> * the data in those tables will be lost if you pull the plug or kill -9
> the primary postgres server
> * the tables are not available (at all) in replicas - so in case of a
> fallback, you would have to have a manual data import/export for those
> tables on fail-over, rather than rely on replicas being "ready to fallback
> immediately". Or accept data loss.
> * the data from those tables will not be present in backups
>
> I am not 100% sure, but I believe loss of data in both tables is not
> really catastrophic for Airflow, so maybe it's acceptable risk (but likely
> you should do a disaster-recovery test to see what happens and how to
> recover in case, indeed, someone pulls the plug on your postgres server.
>
> J,
>
>
>
>
> On Mon, Dec 29, 2025 at 7:16 PM Natanel <[email protected]> wrote:
>
>> Yes, the problem is the manual deletions, we have tried it, it resulted in
>> the same exact issue, as the scheduled db procedures which clean up the
>> rows marked as deleted actually get deleted, and so it takes up storage,
>> yet it does not solve the WAL problem, the problematic table is actually
>> not task_instance, it is relatively small, the log and job tables are the
>> biggest tables (the problem with them is the primary key change required),
>> by a multiple of 10 (or more, cluster dependant).
>>
>> The smaller batches might solve the issue, however, it seems to just delay
>> the problem a little rather than solve it, as deleting data with a delete
>> query (especially a lot of data) is not a very light operation, and so I
>> think that this is the main issue.
>>
>> It would be nice if we could use partitions instead, as it is a lighter
>> operation, and does not require us to maintain a query and manage our db,
>> I
>> have thought about changing the models, most of the changes are relatively
>> simple, for some it is just removing the foreign key and relying on ORM
>> level constraints, for others, it requires adding a pre query to have the
>> same constrains but I do not like that idea, maybe there is another way to
>> make airflow "partition-friendly"?
>>
>> I can't think of a nice way to do so, maybe it does not exist, as the db
>> clean is as simple as a delete query gets, yet when there is a lot of
>> data,
>> it is all duplicated in WALs.
>>
>> On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev <
>> [email protected]>
>> wrote:
>>
>> > Have you looked at doing manual deletions?  I.e. writing your own sql?
>> >
>> > The db clean command is probably not "optimal" for all scenarios.
>> >
>> > So for example, if the main problem table for you is task_instance, you
>> > could periodically delete TI records in smaller batches using some
>> > appropriate index (whether it exists now or you add it).  Then maybe you
>> > would not stress the db as hard.
>> >
>> > Airflow isn't designed to use partitions so, you may not get good
>> results
>> > with that approach.
>> >
>> >
>> >
>> > On Mon, Dec 29, 2025 at 7:32 AM Natanel <[email protected]>
>> wrote:
>> >
>> > > Hello everyone, after having issues with the 'airflow db clean'
>> command,
>> > > where due to the amount of dags and tasks that are running every day
>> in
>> > our
>> > > deployments, we get a lot of new data every day, which is stored in
>> the
>> > > database, and when we delete the data, due to the way PGSQL works, the
>> > > WAL's get replicated to both the archive storage and the main data
>> > storage
>> > > of the db instance, which in turn, causes a significant jump in cpu
>> > usage,
>> > > ram usage and disk usage, whenever we run the command, which causes
>> all
>> > > kinds of issues, we even had it once fill up the db storage, and
>> causing
>> > > the database to be unresponsive, forcing us to move to our backup
>> > database,
>> > > after we haven't ran the command for a few months due to human error.
>> > >
>> > > As of now, I know that this is the accepted and widely used way of
>> > managing
>> > > the airflow database's size, however, we noticed that it may cause
>> issues
>> > > in certain cases, just like in our case, where if the db has not been
>> > > cleaned up for a while, cleaning it can be problematic.
>> > >
>> > > We decided to try and partition the table, and use pgsql's built in
>> > > retention of partitions, which does not issue a DELETE query, and is
>> > > lighter and faster, while being simpler to use, however, we have
>> > > encountered issues due to having Foreign Key constraints in some
>> tables,
>> > > having to duplicate such keys and other than forcing code changes (as
>> the
>> > > foreign key must include the partitioned key, as the partitioned key
>> must
>> > > be part of the primary key), while also having the issue of sqlalchemy
>> > > breaking once we change the primary key, with the addition of the
>> > > constraints on the primary key breaking.
>> > >
>> > > And in Mysql, due to the foreign keys, it is not possible to partition
>> > > tables which include them, as it is not supported yet (according to
>> this
>> > > <
>> > >
>> >
>> https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html
>> > > >
>> > > ).
>> > >
>> > > Has anyone else tried to use the databases built in partition
>> retention
>> > > system instead of the 'airflow db clean' command?
>> > >
>> > > Thanks, Natanel.
>> > >
>> >
>>
>

Reply via email to