Re: [DISCUSS] Managing airflow database size and retention

Jarek Potiuk Tue, 30 Dec 2025 09:39:30 -0800

Yeah. I am also interested to see all kinds of experiments with different
approaches and seeing which one works best - at the end we are all
theorising a bit and only experimenting might give some good answers.


On Tue, Dec 30, 2025 at 6:10 PM Daniel Standish via dev <
[email protected]> wrote:

> Again I would encourage users to *try* simply archiving old rows more
> frequently with smaller batches and keep it under control.
>
> I would be surprised if this doesn’t work. I’d it doesn’t work I would be
> interested to know what volume of tasks you are pushing through. Or is it
> even used for tasks anymore?
>
> Please try this and report back.
>
> If we’re thinking outside the box, it’s conceivable we could make the jobs
> table only for active jobs, then immediately archive anything not active to
> a job log table.
>
>
>
>
>
> On Tue, Dec 30, 2025 at 12:37 AM Jarek Potiuk <[email protected]> wrote:
>
> > *Why not partitions for Job?*
> >
> > I don't think partitioning is a good idea for the Jobs table - and simply
> > because "job" and "partition" logic is incompatible.
> >
> > The problem here is any "reasonable" partitioning you could come up with
> > for jobs to speed up your pruning cannot be  "static" (and this is what
> > makes partitioning really work).  Partitioning really only makes sense
> when
> > your record can be assigned to a partition at the creation time and it
> > never changes. Job row - by its nature will have to change partitions if
> > you want to make use of partitioning. Something that you cannot prune
> now -
> > because it's "active" now, should - in the future be "completed" and
> ready
> > to be pruned, so it should move to a partition that can be pruned.
> >
> > So, you logically need:
> >
> > * a "partition" or "partitions" of sorts (or some other bucket of jobs)
> > which keeps currently active jobs
> > * other "partitions" (or some other buckets) that are keeping only the
> jobs
> > that are not active any more and can be safely pruned
> >
> > Since each job might take an arbitrarily long time to change from
> "active"
> > to "completed", what you really need is a way for jobs to change from
> > "active" partition to the non-active one. This is not really compliant
> with
> > how partition works, because partitioning is static by definition and
> > moving things between partitions is effectively the same as deleting rows
> > from one table and creating them in another. Partitions can be thought as
> > completely separate tables, that do not share indexes, but have the same
> > structure. Such a "delete"/"insert" pattern for "Job" rows would
> completely
> > destroy performance and would be worse than your WAL issue today.
> >
> > *How to possibly achieve what you want with UNLOGGED? *
> >
> > > and if we set it to unlogged, we lose the replication feature
> >
> > Yeah - that's the tradeoff that I was mentioning before for the JOB
> table.
> > You would definitely lose the "instant" and easy physical block
> replication
> > capabilites, however you could instead implement a way to track "current"
> > jobs and restore them when needed in case of fail-over (which I
> understand
> > a bit would complicate your replication failover mechanism). I did not
> know
> > how difficult it would be, and I have no "good" experience with managing
> > Postgres, but with Postgres reputation, I would be surprised if there was
> > no feature in Postgres that you could use. I looked it up a bit and .. I
> > think there is one.
> >
> > Postgres has something that can help you with that - i.e. fine-grained,
> > logical replication
> > https://www.postgresql.org/docs/current/logical-replication.html. From
> > what
> > I understand, logical replication can work in parallel with physical
> > replication. And unlike physical block-based replication it can be more
> > fine-grained, and work not only on a "row" fine-grained selection but
> also
> > on "actions" performed. For example - you can tell it to only publish
> jobs
> > in a given state, (WHERE) but also to not publish certain operations (say
> > "DELETE").
> >
> > I think your case should be nicely handled for Job by a combination of
> > several things:
> >
> > a) mark the JOB table as UNLOGGED
> > b) enable logical replication for JOB table from primary to replica and
> > only publish INSERT and UPDATE but not publish DELETE and TRUNCATE
> > c) (here I am a bit theorising - because I am not sure how it might work)
> > but you could subscribe on the replica side to receive those INSERTS and
> > UPDATES and apply them to the replica's JOBS table. I **guess** it would
> > work since JOBS table is unlogged (though reading UNLOGGED feature, the
> > table is in a weird Shroedinger state at the replica - it exists but any
> > access to it in replica will yield an error, so maybe you will have to
> > instead SUBSCRIBE to receive INSERTS and UPDATES to a JOBS_REPLICA table
> or
> > smth like that).
> > d) You run "airflow db clean jobs" (or equivalent query) periodically in
> > BOTH -> Primary and Replica instances. That would avoid all the WAL /
> > logical replica overload, because the table is UNLOGGED, and DELETE
> > operations would not be replicated. Effectively you would have
> independent
> > "pruning" of data on both Primary and Replica (and you can scale it
> easily
> > - with as many replicas you want to have)
> > e) if you can subscribe directly to receive the UPDATE and INSERTS in
> > replica JOB table, you are done -> at the moment of failover, all the
> > "current" rows will be in-sync (as much as logical replication lag is
> > concerned). There might be different set of "completed" jobs on both
> > primary and secondary, but that is not an issue.
> >
> > f) if you can only subscribe to receive it in another table, then
> failover
> > will need to be paired with renaming the JOB_REPLICA table to become JOB
> > table, but this is practically 0 cost operation.
> >
> > All that can be done without changing Airflow code
> >
> > J.
> >
> >
> > On Tue, Dec 30, 2025 at 6:51 AM Natanel <[email protected]> wrote:
> >
> > > That is a good idea, however the larger and more problematic table is
> the
> > > job table, we are using replication across our dB's for disaster
> > recovery,
> > > and if we set it to unlogged, we lose the replication feature, for the
> > log
> > > table it is fine, as we do not mind losing audit logs when we switch
> that
> > > active db, however, for the job it will cause problems.
> > >
> > > As the scheduler, dag processor, triggerer and webserver/apiserver,
> along
> > > with all the running task data will be lost, and might cause those jobs
> > to
> > > crash when they try to update the latest_heartbeat field, or the state.
> > >
> > > I do not think we can solve this issue without some code changes to
> make
> > > the tables partition friendly.
> > >
> > > Any other suggestions that might avoid a code change will help as it
> will
> > > allow us to fix the issue faster than upgrading a major release, but I
> do
> > > think that the code change may benefit some of the community.
> > >
> > > On Tue, Dec 30, 2025, 07:02 Rahul Vats <[email protected]> wrote:
> > >
> > > > +1 to Jarek's suggestion. The UNLOGGED table approach for log and job
> > > > tables makes sense for high-volume deployments. Pairing with external
> > log
> > > > storage provides a safety net for recovery scenarios.
> > > >
> > > > Regards,
> > > > Rahul Vats
> > > >
> > > > On Tue, 30 Dec 2025 at 01:50, Jarek Potiuk <[email protected]> wrote:
> > > >
> > > > > Wait.. Actually ... Stupid me!
> > > > >
> > > > > I just realized setting the "unlogged" for the two tables means
> that
> > > you
> > > > do
> > > > > not have to do any partitioning at all ....... Because you will not
> > > have
> > > > > all the WAL issues AT ALL.
> > > > >
> > > > > So it's not really about making things "partitioning friendly" but
> > > simply
> > > > > "mark those two tables as unlogged" and make sure that you handle
> > > > recovery
> > > > > when plug is pulled. No partitioning needed at all. Might be
> > connected
> > > > with
> > > > > the "send audit logs somewhere".
> > > > >
> > > > > So - we might not need **any** change to airflow, just a
> > documentation
> > > > > (maybe even unofficial blog of some sort) - describing how to mark
> > > those
> > > > > two tables as unlogged and how to handle recovery in case of
> replica
> > > > > failover.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 29, 2025 at 9:15 PM Jarek Potiuk <[email protected]>
> > wrote:
> > > > >
> > > > > > > But eg lookiups to log table might use ti key attrs .
> > > > > >
> > > > > > Yeah - that would be the price to pay. Trade-offs, trade-offs
> > > > everywhere.
> > > > > >
> > > > > > Such queries over a multi-partition table would have to do full
> > table
> > > > > scan
> > > > > > - they will not be able to use indexes (unless the queries use
> > > > partition
> > > > > > key). So they would **work** but would be **slow** if the table
> > gets
> > > > too
> > > > > > big.
> > > > > > However I would argue that if someone would like to partition
> those
> > > > > > tables, they will frequently and aggressively prune the old
> > > partitions
> > > > > > (this is the reason why they want to have it) - which will
> > > effectively
> > > > > keep
> > > > > > those tables "small" (relatively). And even if full table scan is
> > > used,
> > > > > if
> > > > > > there is a question about task_id audit log, that would be fine.
> > > > > >
> > > > > > Actually - I strongly believe - and we had this discussion in the
> > > past
> > > > -
> > > > > > that the log table is quite a bit of abomination, because it's
> not
> > a
> > > > > "true"
> > > > > > audit log if it is kept in modifiable database, and anyone who
> want
> > > to
> > > > > make
> > > > > > it a "true" audit log will have to effectively send those log
> > entries
> > > > to
> > > > > a
> > > > > > "write-only" storage and query the audit logs there. And that
> > nicely
> > > > fits
> > > > > > into "unlogged" pattern -> you could add a trigger in your DB to
> > > > > > automatically send audit logs somewhere "write-only" and
> > aggressively
> > > > > prune
> > > > > > the old data (i.e. partitions).
> > > > > >
> > > > > > So - from the side of airflow that would mean that those tables
> are
> > > > > > "partitioning friendly", but not to implement partitioning
> itself.
> > > > > Leaving
> > > > > > "partitioning|" and "handling recovery" and "handling audit logs"
> > up
> > > to
> > > > > the
> > > > > > Deployment Manager. Which IMHO should happen anyway for "log"
> table
> > > and
> > > > > > would be nice pattern to describe as a way to achieve "true"
> audit
> > > log.
> > > > > >
> > > > > > J,.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 29, 2025 at 9:02 PM Daniel Standish via dev <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > >> The problem with partitioning them is that when querying you
> would
> > > > need
> > > > > to
> > > > > >> know the partition.
> > > > > >> But eg lookiups to log table might use ti key attrs .
> > > > > >>
> > > > > >> I would try just trimming log and job table in small batches
> > > > frequently
> > > > > >> using an appropriate index, before thinking about the complexity
> > of
> > > > > >> partitioning.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Dec 29, 2025 at 11:24 AM Jarek Potiuk <[email protected]
> >
> > > > wrote:
> > > > > >>
> > > > > >> > I have a feeling that making those **two** tables
> > > partition-friendly
> > > > > >> should
> > > > > >> > be easy - and maybe that's all that we need. That would make
> it
> > > > > possible
> > > > > >> > for you to use the partitioning, (where it would not be
> > necessary
> > > > for
> > > > > >> > everyone).
> > > > > >> > There might be a few ways of doing it without losing
> uniqueness.
> > > For
> > > > > >> > example Job ID and audit log ID could  follow certain
> > conventions
> > > -
> > > > > and
> > > > > >> > always start with a partition key) thus providing uniqueness
> we
> > > > need.
> > > > > >> >
> > > > > >> > Those two tables are pretty specific and neither job id nor
> log
> > id
> > > > > >> impact
> > > > > >> > anything else in our data model.
> > > > > >> >
> > > > > >> > I think such a change to make those two tables "partition
> > > friendly"
> > > > > >> could
> > > > > >> > be accepted, but we could only say it after seeing a POC.
> > > > > >> >
> > > > > >> > J.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Mon, Dec 29, 2025 at 8:01 PM <[email protected]>
> > wrote:
> > > > > >> >
> > > > > >> > > I do think that the second option is best, it is also what
> we
> > > > wanted
> > > > > >> to
> > > > > >> > > do, the only reason we did not do that is because from our
> > > tests,
> > > > > sql
> > > > > >> > > alchemy sometimes breaks as it expects certain constraints
> > which
> > > > are
> > > > > >> not
> > > > > >> > > there, mainly for update queries, select works well, if I am
> > not
> > > > > >> > mistaken,
> > > > > >> > > there are 3 or 4 large tables, job being the largest (7
> times
> > > > larger
> > > > > >> than
> > > > > >> > > the second largest), the question is, will such a pull
> request
> > > be
> > > > > >> > approved?
> > > > > >> > >
> > > > > >> > > As we do lose the unique constraint (as it becomes per
> > > partition),
> > > > > >> though
> > > > > >> > > it is a sequence that most likely won't repeat until the
> > > previous
> > > > > has
> > > > > >> > been
> > > > > >> > > deleted, but if not, we might query unrelated job or log
> data,
> > > and
> > > > > so
> > > > > >> > > changes in the api server are also needed, creating the pr
> is
> > > not
> > > > a
> > > > > >> > > problem, the question is how beneficial will it be, as if it
> > is
> > > > done
> > > > > >> to
> > > > > >> > > those problematic tables, it means that the preferred way to
> > > > manage
> > > > > >> > > retention is from the db, and can be an optional alembic
> > script.
> > > > > >> > >
> > > > > >> > > I do not want to just push a fix that will add more
> complexity
> > > > than
> > > > > >> the
> > > > > >> > > benefit it will bring.
> > > > > >> > >
> > > > > >> > > If we do go the pr way, a separate discussion is probably
> > needed
> > > > to
> > > > > >> > decide
> > > > > >> > > how it should be done (most likely an additional airflow
> > command
> > > > to
> > > > > >> turn
> > > > > >> > on
> > > > > >> > > or off the partitions and retention)
> > > > > >> > >
> > > > > >> > > Doing it as a custom solution has caused problems with
> > > sqlalchemy,
> > > > > >> and we
> > > > > >> > > do not want to do so as if later an alembic script relies on
> > the
> > > > > >> primary
> > > > > >> > > key in some way, we will need to fix it manually and deal
> with
> > > the
> > > > > >> > problems
> > > > > >> > > it may cause when we update.
> > > > > >> > >
> > > > > >> > > > On 29 Dec 2025, at 21:36, Jarek Potiuk <[email protected]>
> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > Another option is that you ( could make ONLY for those
> two
> > > > > tables -
> > > > > >> > > > partition-friendly.  And do not partition anything else.
> I
> > > > think
> > > > > >> that
> > > > > >> > > > **could** be possible - both have datetime fields that
> could
> > > be
> > > > > >> used as
> > > > > >> > > > partition keys - you would have to assess if you can do it
> > as
> > > > your
> > > > > >> > > "custom"
> > > > > >> > > > solution or whether it would require some changes to
> airflow
> > > > > models.
> > > > > >> > But
> > > > > >> > > I
> > > > > >> > > > can't see foreign key problems if ONLY those two tables
> are
> > > > > >> > partitioned,
> > > > > >> > > so
> > > > > >> > > > likely you could do it yourself in your DB.
> > > > > >> > > >
> > > > > >> > > > In this case - maybe "let's solve those tables that are
> > > > > >> problematic" is
> > > > > >> > > > easier to do than "let's apply partitioning to
> everything".
> > > > > >> > > >
> > > > > >> > > >> On Mon, Dec 29, 2025 at 7:31 PM Jarek Potiuk <
> > > [email protected]
> > > > >
> > > > > >> > wrote:
> > > > > >> > > >>
> > > > > >> > > >> Purely theoretically, you could change the log and job
> > tables
> > > > to
> > > > > be
> > > > > >> > > >> unlogged - and thus avoid WAL for them.
> > > > > >> > > >>
> > > > > >> > > >> The drawback of this:
> > > > > >> > > >>
> > > > > >> > > >> * the data in those tables will be lost if you pull the
> > plug
> > > or
> > > > > >> kill
> > > > > >> > -9
> > > > > >> > > >> the primary postgres server
> > > > > >> > > >> * the tables are not available (at all) in replicas - so
> in
> > > > case
> > > > > >> of a
> > > > > >> > > >> fallback, you would have to have a manual data
> > import/export
> > > > for
> > > > > >> those
> > > > > >> > > >> tables on fail-over, rather than rely on replicas being
> > > "ready
> > > > to
> > > > > >> > > fallback
> > > > > >> > > >> immediately". Or accept data loss.
> > > > > >> > > >> * the data from those tables will not be present in
> backups
> > > > > >> > > >>
> > > > > >> > > >> I am not 100% sure, but I believe loss of data in both
> > tables
> > > > is
> > > > > >> not
> > > > > >> > > >> really catastrophic for Airflow, so maybe it's acceptable
> > > risk
> > > > > (but
> > > > > >> > > likely
> > > > > >> > > >> you should do a disaster-recovery test to see what
> happens
> > > and
> > > > > how
> > > > > >> to
> > > > > >> > > >> recover in case, indeed, someone pulls the plug on your
> > > > postgres
> > > > > >> > server.
> > > > > >> > > >>
> > > > > >> > > >> J,
> > > > > >> > > >>
> > > > > >> > > >>
> > > > > >> > > >>
> > > > > >> > > >>
> > > > > >> > > >>> On Mon, Dec 29, 2025 at 7:16 PM Natanel <
> > > > > [email protected]>
> > > > > >> > > wrote:
> > > > > >> > > >>>
> > > > > >> > > >>> Yes, the problem is the manual deletions, we have tried
> > it,
> > > it
> > > > > >> > > resulted in
> > > > > >> > > >>> the same exact issue, as the scheduled db procedures
> which
> > > > clean
> > > > > >> up
> > > > > >> > the
> > > > > >> > > >>> rows marked as deleted actually get deleted, and so it
> > takes
> > > > up
> > > > > >> > > storage,
> > > > > >> > > >>> yet it does not solve the WAL problem, the problematic
> > table
> > > > is
> > > > > >> > > actually
> > > > > >> > > >>> not task_instance, it is relatively small, the log and
> job
> > > > > tables
> > > > > >> are
> > > > > >> > > the
> > > > > >> > > >>> biggest tables (the problem with them is the primary key
> > > > change
> > > > > >> > > required),
> > > > > >> > > >>> by a multiple of 10 (or more, cluster dependant).
> > > > > >> > > >>>
> > > > > >> > > >>> The smaller batches might solve the issue, however, it
> > seems
> > > > to
> > > > > >> just
> > > > > >> > > delay
> > > > > >> > > >>> the problem a little rather than solve it, as deleting
> > data
> > > > > with a
> > > > > >> > > delete
> > > > > >> > > >>> query (especially a lot of data) is not a very light
> > > > operation,
> > > > > >> and
> > > > > >> > so
> > > > > >> > > I
> > > > > >> > > >>> think that this is the main issue.
> > > > > >> > > >>>
> > > > > >> > > >>> It would be nice if we could use partitions instead, as
> it
> > > is
> > > > a
> > > > > >> > lighter
> > > > > >> > > >>> operation, and does not require us to maintain a query
> and
> > > > > manage
> > > > > >> our
> > > > > >> > > db,
> > > > > >> > > >>> I
> > > > > >> > > >>> have thought about changing the models, most of the
> > changes
> > > > are
> > > > > >> > > relatively
> > > > > >> > > >>> simple, for some it is just removing the foreign key and
> > > > relying
> > > > > >> on
> > > > > >> > ORM
> > > > > >> > > >>> level constraints, for others, it requires adding a pre
> > > query
> > > > to
> > > > > >> have
> > > > > >> > > the
> > > > > >> > > >>> same constrains but I do not like that idea, maybe there
> > is
> > > > > >> another
> > > > > >> > > way to
> > > > > >> > > >>> make airflow "partition-friendly"?
> > > > > >> > > >>>
> > > > > >> > > >>> I can't think of a nice way to do so, maybe it does not
> > > exist,
> > > > > as
> > > > > >> the
> > > > > >> > > db
> > > > > >> > > >>> clean is as simple as a delete query gets, yet when
> there
> > > is a
> > > > > >> lot of
> > > > > >> > > >>> data,
> > > > > >> > > >>> it is all duplicated in WALs.
> > > > > >> > > >>>
> > > > > >> > > >>> On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev <
> > > > > >> > > >>> [email protected]>
> > > > > >> > > >>> wrote:
> > > > > >> > > >>>
> > > > > >> > > >>>> Have you looked at doing manual deletions?  I.e.
> writing
> > > your
> > > > > own
> > > > > >> > sql?
> > > > > >> > > >>>>
> > > > > >> > > >>>> The db clean command is probably not "optimal" for all
> > > > > scenarios.
> > > > > >> > > >>>>
> > > > > >> > > >>>> So for example, if the main problem table for you is
> > > > > >> task_instance,
> > > > > >> > > you
> > > > > >> > > >>>> could periodically delete TI records in smaller batches
> > > using
> > > > > >> some
> > > > > >> > > >>>> appropriate index (whether it exists now or you add
> it).
> > > > Then
> > > > > >> maybe
> > > > > >> > > you
> > > > > >> > > >>>> would not stress the db as hard.
> > > > > >> > > >>>>
> > > > > >> > > >>>> Airflow isn't designed to use partitions so, you may
> not
> > > get
> > > > > good
> > > > > >> > > >>> results
> > > > > >> > > >>>> with that approach.
> > > > > >> > > >>>>
> > > > > >> > > >>>>
> > > > > >> > > >>>>
> > > > > >> > > >>>> On Mon, Dec 29, 2025 at 7:32 AM Natanel <
> > > > > [email protected]
> > > > > >> >
> > > > > >> > > >>> wrote:
> > > > > >> > > >>>>
> > > > > >> > > >>>>> Hello everyone, after having issues with the 'airflow
> db
> > > > > clean'
> > > > > >> > > >>> command,
> > > > > >> > > >>>>> where due to the amount of dags and tasks that are
> > running
> > > > > every
> > > > > >> > day
> > > > > >> > > >>> in
> > > > > >> > > >>>> our
> > > > > >> > > >>>>> deployments, we get a lot of new data every day, which
> > is
> > > > > >> stored in
> > > > > >> > > >>> the
> > > > > >> > > >>>>> database, and when we delete the data, due to the way
> > > PGSQL
> > > > > >> works,
> > > > > >> > > the
> > > > > >> > > >>>>> WAL's get replicated to both the archive storage and
> the
> > > > main
> > > > > >> data
> > > > > >> > > >>>> storage
> > > > > >> > > >>>>> of the db instance, which in turn, causes a
> significant
> > > jump
> > > > > in
> > > > > >> cpu
> > > > > >> > > >>>> usage,
> > > > > >> > > >>>>> ram usage and disk usage, whenever we run the command,
> > > which
> > > > > >> causes
> > > > > >> > > >>> all
> > > > > >> > > >>>>> kinds of issues, we even had it once fill up the db
> > > storage,
> > > > > and
> > > > > >> > > >>> causing
> > > > > >> > > >>>>> the database to be unresponsive, forcing us to move to
> > our
> > > > > >> backup
> > > > > >> > > >>>> database,
> > > > > >> > > >>>>> after we haven't ran the command for a few months due
> to
> > > > human
> > > > > >> > error.
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> As of now, I know that this is the accepted and widely
> > > used
> > > > > way
> > > > > >> of
> > > > > >> > > >>>> managing
> > > > > >> > > >>>>> the airflow database's size, however, we noticed that
> it
> > > may
> > > > > >> cause
> > > > > >> > > >>> issues
> > > > > >> > > >>>>> in certain cases, just like in our case, where if the
> db
> > > has
> > > > > not
> > > > > >> > been
> > > > > >> > > >>>>> cleaned up for a while, cleaning it can be
> problematic.
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> We decided to try and partition the table, and use
> > pgsql's
> > > > > >> built in
> > > > > >> > > >>>>> retention of partitions, which does not issue a DELETE
> > > > query,
> > > > > >> and
> > > > > >> > is
> > > > > >> > > >>>>> lighter and faster, while being simpler to use,
> however,
> > > we
> > > > > have
> > > > > >> > > >>>>> encountered issues due to having Foreign Key
> constraints
> > > in
> > > > > some
> > > > > >> > > >>> tables,
> > > > > >> > > >>>>> having to duplicate such keys and other than forcing
> > code
> > > > > >> changes
> > > > > >> > (as
> > > > > >> > > >>> the
> > > > > >> > > >>>>> foreign key must include the partitioned key, as the
> > > > > partitioned
> > > > > >> > key
> > > > > >> > > >>> must
> > > > > >> > > >>>>> be part of the primary key), while also having the
> issue
> > > of
> > > > > >> > > sqlalchemy
> > > > > >> > > >>>>> breaking once we change the primary key, with the
> > addition
> > > > of
> > > > > >> the
> > > > > >> > > >>>>> constraints on the primary key breaking.
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> And in Mysql, due to the foreign keys, it is not
> > possible
> > > to
> > > > > >> > > partition
> > > > > >> > > >>>>> tables which include them, as it is not supported yet
> > > > > >> (according to
> > > > > >> > > >>> this
> > > > > >> > > >>>>> <
> > > > > >> > > >>>>>
> > > > > >> > > >>>>
> > > > > >> > > >>>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html
> > > > > >> > > >>>>>>
> > > > > >> > > >>>>> ).
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> Has anyone else tried to use the databases built in
> > > > partition
> > > > > >> > > >>> retention
> > > > > >> > > >>>>> system instead of the 'airflow db clean' command?
> > > > > >> > > >>>>>
> > > > > >> > > >>>>> Thanks, Natanel.
> > > > > >> > > >>>>>
> > > > > >> > > >>>>
> > > > > >> > > >>>
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> > >
> > > > >
> ---------------------------------------------------------------------
> > > > > >> > > To unsubscribe, e-mail: [email protected]
> > > > > >> > > For additional commands, e-mail:
> [email protected]
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Managing airflow database size and retention

Reply via email to