Yeah. I am also interested to see all kinds of experiments with different approaches and seeing which one works best - at the end we are all theorising a bit and only experimenting might give some good answers.
On Tue, Dec 30, 2025 at 6:10 PM Daniel Standish via dev < [email protected]> wrote: > Again I would encourage users to *try* simply archiving old rows more > frequently with smaller batches and keep it under control. > > I would be surprised if this doesn’t work. I’d it doesn’t work I would be > interested to know what volume of tasks you are pushing through. Or is it > even used for tasks anymore? > > Please try this and report back. > > If we’re thinking outside the box, it’s conceivable we could make the jobs > table only for active jobs, then immediately archive anything not active to > a job log table. > > > > > > On Tue, Dec 30, 2025 at 12:37 AM Jarek Potiuk <[email protected]> wrote: > > > *Why not partitions for Job?* > > > > I don't think partitioning is a good idea for the Jobs table - and simply > > because "job" and "partition" logic is incompatible. > > > > The problem here is any "reasonable" partitioning you could come up with > > for jobs to speed up your pruning cannot be "static" (and this is what > > makes partitioning really work). Partitioning really only makes sense > when > > your record can be assigned to a partition at the creation time and it > > never changes. Job row - by its nature will have to change partitions if > > you want to make use of partitioning. Something that you cannot prune > now - > > because it's "active" now, should - in the future be "completed" and > ready > > to be pruned, so it should move to a partition that can be pruned. > > > > So, you logically need: > > > > * a "partition" or "partitions" of sorts (or some other bucket of jobs) > > which keeps currently active jobs > > * other "partitions" (or some other buckets) that are keeping only the > jobs > > that are not active any more and can be safely pruned > > > > Since each job might take an arbitrarily long time to change from > "active" > > to "completed", what you really need is a way for jobs to change from > > "active" partition to the non-active one. This is not really compliant > with > > how partition works, because partitioning is static by definition and > > moving things between partitions is effectively the same as deleting rows > > from one table and creating them in another. Partitions can be thought as > > completely separate tables, that do not share indexes, but have the same > > structure. Such a "delete"/"insert" pattern for "Job" rows would > completely > > destroy performance and would be worse than your WAL issue today. > > > > *How to possibly achieve what you want with UNLOGGED? * > > > > > and if we set it to unlogged, we lose the replication feature > > > > Yeah - that's the tradeoff that I was mentioning before for the JOB > table. > > You would definitely lose the "instant" and easy physical block > replication > > capabilites, however you could instead implement a way to track "current" > > jobs and restore them when needed in case of fail-over (which I > understand > > a bit would complicate your replication failover mechanism). I did not > know > > how difficult it would be, and I have no "good" experience with managing > > Postgres, but with Postgres reputation, I would be surprised if there was > > no feature in Postgres that you could use. I looked it up a bit and .. I > > think there is one. > > > > Postgres has something that can help you with that - i.e. fine-grained, > > logical replication > > https://www.postgresql.org/docs/current/logical-replication.html. From > > what > > I understand, logical replication can work in parallel with physical > > replication. And unlike physical block-based replication it can be more > > fine-grained, and work not only on a "row" fine-grained selection but > also > > on "actions" performed. For example - you can tell it to only publish > jobs > > in a given state, (WHERE) but also to not publish certain operations (say > > "DELETE"). > > > > I think your case should be nicely handled for Job by a combination of > > several things: > > > > a) mark the JOB table as UNLOGGED > > b) enable logical replication for JOB table from primary to replica and > > only publish INSERT and UPDATE but not publish DELETE and TRUNCATE > > c) (here I am a bit theorising - because I am not sure how it might work) > > but you could subscribe on the replica side to receive those INSERTS and > > UPDATES and apply them to the replica's JOBS table. I **guess** it would > > work since JOBS table is unlogged (though reading UNLOGGED feature, the > > table is in a weird Shroedinger state at the replica - it exists but any > > access to it in replica will yield an error, so maybe you will have to > > instead SUBSCRIBE to receive INSERTS and UPDATES to a JOBS_REPLICA table > or > > smth like that). > > d) You run "airflow db clean jobs" (or equivalent query) periodically in > > BOTH -> Primary and Replica instances. That would avoid all the WAL / > > logical replica overload, because the table is UNLOGGED, and DELETE > > operations would not be replicated. Effectively you would have > independent > > "pruning" of data on both Primary and Replica (and you can scale it > easily > > - with as many replicas you want to have) > > e) if you can subscribe directly to receive the UPDATE and INSERTS in > > replica JOB table, you are done -> at the moment of failover, all the > > "current" rows will be in-sync (as much as logical replication lag is > > concerned). There might be different set of "completed" jobs on both > > primary and secondary, but that is not an issue. > > > > f) if you can only subscribe to receive it in another table, then > failover > > will need to be paired with renaming the JOB_REPLICA table to become JOB > > table, but this is practically 0 cost operation. > > > > All that can be done without changing Airflow code > > > > J. > > > > > > On Tue, Dec 30, 2025 at 6:51 AM Natanel <[email protected]> wrote: > > > > > That is a good idea, however the larger and more problematic table is > the > > > job table, we are using replication across our dB's for disaster > > recovery, > > > and if we set it to unlogged, we lose the replication feature, for the > > log > > > table it is fine, as we do not mind losing audit logs when we switch > that > > > active db, however, for the job it will cause problems. > > > > > > As the scheduler, dag processor, triggerer and webserver/apiserver, > along > > > with all the running task data will be lost, and might cause those jobs > > to > > > crash when they try to update the latest_heartbeat field, or the state. > > > > > > I do not think we can solve this issue without some code changes to > make > > > the tables partition friendly. > > > > > > Any other suggestions that might avoid a code change will help as it > will > > > allow us to fix the issue faster than upgrading a major release, but I > do > > > think that the code change may benefit some of the community. > > > > > > On Tue, Dec 30, 2025, 07:02 Rahul Vats <[email protected]> wrote: > > > > > > > +1 to Jarek's suggestion. The UNLOGGED table approach for log and job > > > > tables makes sense for high-volume deployments. Pairing with external > > log > > > > storage provides a safety net for recovery scenarios. > > > > > > > > Regards, > > > > Rahul Vats > > > > > > > > On Tue, 30 Dec 2025 at 01:50, Jarek Potiuk <[email protected]> wrote: > > > > > > > > > Wait.. Actually ... Stupid me! > > > > > > > > > > I just realized setting the "unlogged" for the two tables means > that > > > you > > > > do > > > > > not have to do any partitioning at all ....... Because you will not > > > have > > > > > all the WAL issues AT ALL. > > > > > > > > > > So it's not really about making things "partitioning friendly" but > > > simply > > > > > "mark those two tables as unlogged" and make sure that you handle > > > > recovery > > > > > when plug is pulled. No partitioning needed at all. Might be > > connected > > > > with > > > > > the "send audit logs somewhere". > > > > > > > > > > So - we might not need **any** change to airflow, just a > > documentation > > > > > (maybe even unofficial blog of some sort) - describing how to mark > > > those > > > > > two tables as unlogged and how to handle recovery in case of > replica > > > > > failover. > > > > > > > > > > > > > > > > > > > > On Mon, Dec 29, 2025 at 9:15 PM Jarek Potiuk <[email protected]> > > wrote: > > > > > > > > > > > > But eg lookiups to log table might use ti key attrs . > > > > > > > > > > > > Yeah - that would be the price to pay. Trade-offs, trade-offs > > > > everywhere. > > > > > > > > > > > > Such queries over a multi-partition table would have to do full > > table > > > > > scan > > > > > > - they will not be able to use indexes (unless the queries use > > > > partition > > > > > > key). So they would **work** but would be **slow** if the table > > gets > > > > too > > > > > > big. > > > > > > However I would argue that if someone would like to partition > those > > > > > > tables, they will frequently and aggressively prune the old > > > partitions > > > > > > (this is the reason why they want to have it) - which will > > > effectively > > > > > keep > > > > > > those tables "small" (relatively). And even if full table scan is > > > used, > > > > > if > > > > > > there is a question about task_id audit log, that would be fine. > > > > > > > > > > > > Actually - I strongly believe - and we had this discussion in the > > > past > > > > - > > > > > > that the log table is quite a bit of abomination, because it's > not > > a > > > > > "true" > > > > > > audit log if it is kept in modifiable database, and anyone who > want > > > to > > > > > make > > > > > > it a "true" audit log will have to effectively send those log > > entries > > > > to > > > > > a > > > > > > "write-only" storage and query the audit logs there. And that > > nicely > > > > fits > > > > > > into "unlogged" pattern -> you could add a trigger in your DB to > > > > > > automatically send audit logs somewhere "write-only" and > > aggressively > > > > > prune > > > > > > the old data (i.e. partitions). > > > > > > > > > > > > So - from the side of airflow that would mean that those tables > are > > > > > > "partitioning friendly", but not to implement partitioning > itself. > > > > > Leaving > > > > > > "partitioning|" and "handling recovery" and "handling audit logs" > > up > > > to > > > > > the > > > > > > Deployment Manager. Which IMHO should happen anyway for "log" > table > > > and > > > > > > would be nice pattern to describe as a way to achieve "true" > audit > > > log. > > > > > > > > > > > > J,. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 29, 2025 at 9:02 PM Daniel Standish via dev < > > > > > > [email protected]> wrote: > > > > > > > > > > > >> The problem with partitioning them is that when querying you > would > > > > need > > > > > to > > > > > >> know the partition. > > > > > >> But eg lookiups to log table might use ti key attrs . > > > > > >> > > > > > >> I would try just trimming log and job table in small batches > > > > frequently > > > > > >> using an appropriate index, before thinking about the complexity > > of > > > > > >> partitioning. > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> On Mon, Dec 29, 2025 at 11:24 AM Jarek Potiuk <[email protected] > > > > > > wrote: > > > > > >> > > > > > >> > I have a feeling that making those **two** tables > > > partition-friendly > > > > > >> should > > > > > >> > be easy - and maybe that's all that we need. That would make > it > > > > > possible > > > > > >> > for you to use the partitioning, (where it would not be > > necessary > > > > for > > > > > >> > everyone). > > > > > >> > There might be a few ways of doing it without losing > uniqueness. > > > For > > > > > >> > example Job ID and audit log ID could follow certain > > conventions > > > - > > > > > and > > > > > >> > always start with a partition key) thus providing uniqueness > we > > > > need. > > > > > >> > > > > > > >> > Those two tables are pretty specific and neither job id nor > log > > id > > > > > >> impact > > > > > >> > anything else in our data model. > > > > > >> > > > > > > >> > I think such a change to make those two tables "partition > > > friendly" > > > > > >> could > > > > > >> > be accepted, but we could only say it after seeing a POC. > > > > > >> > > > > > > >> > J. > > > > > >> > > > > > > >> > > > > > > >> > On Mon, Dec 29, 2025 at 8:01 PM <[email protected]> > > wrote: > > > > > >> > > > > > > >> > > I do think that the second option is best, it is also what > we > > > > wanted > > > > > >> to > > > > > >> > > do, the only reason we did not do that is because from our > > > tests, > > > > > sql > > > > > >> > > alchemy sometimes breaks as it expects certain constraints > > which > > > > are > > > > > >> not > > > > > >> > > there, mainly for update queries, select works well, if I am > > not > > > > > >> > mistaken, > > > > > >> > > there are 3 or 4 large tables, job being the largest (7 > times > > > > larger > > > > > >> than > > > > > >> > > the second largest), the question is, will such a pull > request > > > be > > > > > >> > approved? > > > > > >> > > > > > > > >> > > As we do lose the unique constraint (as it becomes per > > > partition), > > > > > >> though > > > > > >> > > it is a sequence that most likely won't repeat until the > > > previous > > > > > has > > > > > >> > been > > > > > >> > > deleted, but if not, we might query unrelated job or log > data, > > > and > > > > > so > > > > > >> > > changes in the api server are also needed, creating the pr > is > > > not > > > > a > > > > > >> > > problem, the question is how beneficial will it be, as if it > > is > > > > done > > > > > >> to > > > > > >> > > those problematic tables, it means that the preferred way to > > > > manage > > > > > >> > > retention is from the db, and can be an optional alembic > > script. > > > > > >> > > > > > > > >> > > I do not want to just push a fix that will add more > complexity > > > > than > > > > > >> the > > > > > >> > > benefit it will bring. > > > > > >> > > > > > > > >> > > If we do go the pr way, a separate discussion is probably > > needed > > > > to > > > > > >> > decide > > > > > >> > > how it should be done (most likely an additional airflow > > command > > > > to > > > > > >> turn > > > > > >> > on > > > > > >> > > or off the partitions and retention) > > > > > >> > > > > > > > >> > > Doing it as a custom solution has caused problems with > > > sqlalchemy, > > > > > >> and we > > > > > >> > > do not want to do so as if later an alembic script relies on > > the > > > > > >> primary > > > > > >> > > key in some way, we will need to fix it manually and deal > with > > > the > > > > > >> > problems > > > > > >> > > it may cause when we update. > > > > > >> > > > > > > > >> > > > On 29 Dec 2025, at 21:36, Jarek Potiuk <[email protected]> > > > > wrote: > > > > > >> > > > > > > > > >> > > > Another option is that you ( could make ONLY for those > two > > > > > tables - > > > > > >> > > > partition-friendly. And do not partition anything else. > I > > > > think > > > > > >> that > > > > > >> > > > **could** be possible - both have datetime fields that > could > > > be > > > > > >> used as > > > > > >> > > > partition keys - you would have to assess if you can do it > > as > > > > your > > > > > >> > > "custom" > > > > > >> > > > solution or whether it would require some changes to > airflow > > > > > models. > > > > > >> > But > > > > > >> > > I > > > > > >> > > > can't see foreign key problems if ONLY those two tables > are > > > > > >> > partitioned, > > > > > >> > > so > > > > > >> > > > likely you could do it yourself in your DB. > > > > > >> > > > > > > > > >> > > > In this case - maybe "let's solve those tables that are > > > > > >> problematic" is > > > > > >> > > > easier to do than "let's apply partitioning to > everything". > > > > > >> > > > > > > > > >> > > >> On Mon, Dec 29, 2025 at 7:31 PM Jarek Potiuk < > > > [email protected] > > > > > > > > > > >> > wrote: > > > > > >> > > >> > > > > > >> > > >> Purely theoretically, you could change the log and job > > tables > > > > to > > > > > be > > > > > >> > > >> unlogged - and thus avoid WAL for them. > > > > > >> > > >> > > > > > >> > > >> The drawback of this: > > > > > >> > > >> > > > > > >> > > >> * the data in those tables will be lost if you pull the > > plug > > > or > > > > > >> kill > > > > > >> > -9 > > > > > >> > > >> the primary postgres server > > > > > >> > > >> * the tables are not available (at all) in replicas - so > in > > > > case > > > > > >> of a > > > > > >> > > >> fallback, you would have to have a manual data > > import/export > > > > for > > > > > >> those > > > > > >> > > >> tables on fail-over, rather than rely on replicas being > > > "ready > > > > to > > > > > >> > > fallback > > > > > >> > > >> immediately". Or accept data loss. > > > > > >> > > >> * the data from those tables will not be present in > backups > > > > > >> > > >> > > > > > >> > > >> I am not 100% sure, but I believe loss of data in both > > tables > > > > is > > > > > >> not > > > > > >> > > >> really catastrophic for Airflow, so maybe it's acceptable > > > risk > > > > > (but > > > > > >> > > likely > > > > > >> > > >> you should do a disaster-recovery test to see what > happens > > > and > > > > > how > > > > > >> to > > > > > >> > > >> recover in case, indeed, someone pulls the plug on your > > > > postgres > > > > > >> > server. > > > > > >> > > >> > > > > > >> > > >> J, > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >>> On Mon, Dec 29, 2025 at 7:16 PM Natanel < > > > > > [email protected]> > > > > > >> > > wrote: > > > > > >> > > >>> > > > > > >> > > >>> Yes, the problem is the manual deletions, we have tried > > it, > > > it > > > > > >> > > resulted in > > > > > >> > > >>> the same exact issue, as the scheduled db procedures > which > > > > clean > > > > > >> up > > > > > >> > the > > > > > >> > > >>> rows marked as deleted actually get deleted, and so it > > takes > > > > up > > > > > >> > > storage, > > > > > >> > > >>> yet it does not solve the WAL problem, the problematic > > table > > > > is > > > > > >> > > actually > > > > > >> > > >>> not task_instance, it is relatively small, the log and > job > > > > > tables > > > > > >> are > > > > > >> > > the > > > > > >> > > >>> biggest tables (the problem with them is the primary key > > > > change > > > > > >> > > required), > > > > > >> > > >>> by a multiple of 10 (or more, cluster dependant). > > > > > >> > > >>> > > > > > >> > > >>> The smaller batches might solve the issue, however, it > > seems > > > > to > > > > > >> just > > > > > >> > > delay > > > > > >> > > >>> the problem a little rather than solve it, as deleting > > data > > > > > with a > > > > > >> > > delete > > > > > >> > > >>> query (especially a lot of data) is not a very light > > > > operation, > > > > > >> and > > > > > >> > so > > > > > >> > > I > > > > > >> > > >>> think that this is the main issue. > > > > > >> > > >>> > > > > > >> > > >>> It would be nice if we could use partitions instead, as > it > > > is > > > > a > > > > > >> > lighter > > > > > >> > > >>> operation, and does not require us to maintain a query > and > > > > > manage > > > > > >> our > > > > > >> > > db, > > > > > >> > > >>> I > > > > > >> > > >>> have thought about changing the models, most of the > > changes > > > > are > > > > > >> > > relatively > > > > > >> > > >>> simple, for some it is just removing the foreign key and > > > > relying > > > > > >> on > > > > > >> > ORM > > > > > >> > > >>> level constraints, for others, it requires adding a pre > > > query > > > > to > > > > > >> have > > > > > >> > > the > > > > > >> > > >>> same constrains but I do not like that idea, maybe there > > is > > > > > >> another > > > > > >> > > way to > > > > > >> > > >>> make airflow "partition-friendly"? > > > > > >> > > >>> > > > > > >> > > >>> I can't think of a nice way to do so, maybe it does not > > > exist, > > > > > as > > > > > >> the > > > > > >> > > db > > > > > >> > > >>> clean is as simple as a delete query gets, yet when > there > > > is a > > > > > >> lot of > > > > > >> > > >>> data, > > > > > >> > > >>> it is all duplicated in WALs. > > > > > >> > > >>> > > > > > >> > > >>> On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev < > > > > > >> > > >>> [email protected]> > > > > > >> > > >>> wrote: > > > > > >> > > >>> > > > > > >> > > >>>> Have you looked at doing manual deletions? I.e. > writing > > > your > > > > > own > > > > > >> > sql? > > > > > >> > > >>>> > > > > > >> > > >>>> The db clean command is probably not "optimal" for all > > > > > scenarios. > > > > > >> > > >>>> > > > > > >> > > >>>> So for example, if the main problem table for you is > > > > > >> task_instance, > > > > > >> > > you > > > > > >> > > >>>> could periodically delete TI records in smaller batches > > > using > > > > > >> some > > > > > >> > > >>>> appropriate index (whether it exists now or you add > it). > > > > Then > > > > > >> maybe > > > > > >> > > you > > > > > >> > > >>>> would not stress the db as hard. > > > > > >> > > >>>> > > > > > >> > > >>>> Airflow isn't designed to use partitions so, you may > not > > > get > > > > > good > > > > > >> > > >>> results > > > > > >> > > >>>> with that approach. > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > > >> > > >>>> On Mon, Dec 29, 2025 at 7:32 AM Natanel < > > > > > [email protected] > > > > > >> > > > > > > >> > > >>> wrote: > > > > > >> > > >>>> > > > > > >> > > >>>>> Hello everyone, after having issues with the 'airflow > db > > > > > clean' > > > > > >> > > >>> command, > > > > > >> > > >>>>> where due to the amount of dags and tasks that are > > running > > > > > every > > > > > >> > day > > > > > >> > > >>> in > > > > > >> > > >>>> our > > > > > >> > > >>>>> deployments, we get a lot of new data every day, which > > is > > > > > >> stored in > > > > > >> > > >>> the > > > > > >> > > >>>>> database, and when we delete the data, due to the way > > > PGSQL > > > > > >> works, > > > > > >> > > the > > > > > >> > > >>>>> WAL's get replicated to both the archive storage and > the > > > > main > > > > > >> data > > > > > >> > > >>>> storage > > > > > >> > > >>>>> of the db instance, which in turn, causes a > significant > > > jump > > > > > in > > > > > >> cpu > > > > > >> > > >>>> usage, > > > > > >> > > >>>>> ram usage and disk usage, whenever we run the command, > > > which > > > > > >> causes > > > > > >> > > >>> all > > > > > >> > > >>>>> kinds of issues, we even had it once fill up the db > > > storage, > > > > > and > > > > > >> > > >>> causing > > > > > >> > > >>>>> the database to be unresponsive, forcing us to move to > > our > > > > > >> backup > > > > > >> > > >>>> database, > > > > > >> > > >>>>> after we haven't ran the command for a few months due > to > > > > human > > > > > >> > error. > > > > > >> > > >>>>> > > > > > >> > > >>>>> As of now, I know that this is the accepted and widely > > > used > > > > > way > > > > > >> of > > > > > >> > > >>>> managing > > > > > >> > > >>>>> the airflow database's size, however, we noticed that > it > > > may > > > > > >> cause > > > > > >> > > >>> issues > > > > > >> > > >>>>> in certain cases, just like in our case, where if the > db > > > has > > > > > not > > > > > >> > been > > > > > >> > > >>>>> cleaned up for a while, cleaning it can be > problematic. > > > > > >> > > >>>>> > > > > > >> > > >>>>> We decided to try and partition the table, and use > > pgsql's > > > > > >> built in > > > > > >> > > >>>>> retention of partitions, which does not issue a DELETE > > > > query, > > > > > >> and > > > > > >> > is > > > > > >> > > >>>>> lighter and faster, while being simpler to use, > however, > > > we > > > > > have > > > > > >> > > >>>>> encountered issues due to having Foreign Key > constraints > > > in > > > > > some > > > > > >> > > >>> tables, > > > > > >> > > >>>>> having to duplicate such keys and other than forcing > > code > > > > > >> changes > > > > > >> > (as > > > > > >> > > >>> the > > > > > >> > > >>>>> foreign key must include the partitioned key, as the > > > > > partitioned > > > > > >> > key > > > > > >> > > >>> must > > > > > >> > > >>>>> be part of the primary key), while also having the > issue > > > of > > > > > >> > > sqlalchemy > > > > > >> > > >>>>> breaking once we change the primary key, with the > > addition > > > > of > > > > > >> the > > > > > >> > > >>>>> constraints on the primary key breaking. > > > > > >> > > >>>>> > > > > > >> > > >>>>> And in Mysql, due to the foreign keys, it is not > > possible > > > to > > > > > >> > > partition > > > > > >> > > >>>>> tables which include them, as it is not supported yet > > > > > >> (according to > > > > > >> > > >>> this > > > > > >> > > >>>>> < > > > > > >> > > >>>>> > > > > > >> > > >>>> > > > > > >> > > >>> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html > > > > > >> > > >>>>>> > > > > > >> > > >>>>> ). > > > > > >> > > >>>>> > > > > > >> > > >>>>> Has anyone else tried to use the databases built in > > > > partition > > > > > >> > > >>> retention > > > > > >> > > >>>>> system instead of the 'airflow db clean' command? > > > > > >> > > >>>>> > > > > > >> > > >>>>> Thanks, Natanel. > > > > > >> > > >>>>> > > > > > >> > > >>>> > > > > > >> > > >>> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > > > --------------------------------------------------------------------- > > > > > >> > > To unsubscribe, e-mail: [email protected] > > > > > >> > > For additional commands, e-mail: > [email protected] > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
