Re: S3 Dag Bundle Versions and DB Manager

Zhe You Liu Thu, 17 Jul 2025 22:44:11 -0700

> This is a good solution. It goes along the idea of a "generic" solution
> that does not need an "amazon specific" table and DB manager. If the
> manifest serialized field can be used for all other "bundles" (even if
> manifest format itself is specific to S3 bundle), I am very happy with that
> solution.


Glad to hear that. Let’s wait for input from others as well, there may be 
concerns I haven’t fully considered.

> One thing to consider (but this is entirely up to the S3 bundle
> implementation) is handling versioning of such manifest during
> serialization/deserialization to allow downgrading and upgrading the
> provider seamlessly.

Nice point and I agree. Regardless of which approach we take (storing in the DB 
or in object storage), we’ll need to handle serialization properly and ensure 
backward compatibility.

Best,
Jason Liu


On 2025/07/18 04:59:45 Jarek Potiuk wrote:
> > In my opinion, we can simply add an optional `manifest` field (or another
> suitable name). I don’t think we need to introduce a new table via
> DbManager; an additional field for storing metadata about the external
> state (such as prefix and object versions for all dags in the bundle, in
> the case of S3DagBundle) should suffice. We could introduce a new parent
> subclass, such as `RemoteDagBundle` or `ObjectStoreDagBundle`, in the
> common provider to define the structure for serializing and deserializing
> the `manifest` field.
> 
> This is a good solution. It goes along the idea of a "generic" solution
> that does not need an "amazon specific" table and DB manager. If the
> manifest serialized field can be used for all other "bundles" (even if
> manifest format itself is specific to S3 bundle), I am very happy with that
> solution. One thing to consider (but this is entirely up to the S3 bundle
> implementation) is handling versioning of such manifest during
> serialization/deserialization to allow downgrading and upgrading the
> provider seamlessly.
> 
> 
> 
> On Fri, Jul 18, 2025 at 5:56 AM Zhe You Liu <jason...@apache.org> wrote:
> 
> > Sorry for the late response.
> >
> > Both approaches work for me; I just wanted to share my opinion as we
> > settle on a final decision.
> >
> > From my perspective, the DagBundle acts as a client that pulls external
> > state and stores only the version identifier in the Airflow metadata DB.
> >
> > For example, with GitDagBundle, the Git repository serves as the external
> > storage. The GitDagBundle pulls DAG files locally and stores the commit
> > hash as the `version` field in `DagBundleModel.version`.
> >
> > 1. If we choose to store the manifest in the Airflow metadata DB:
> >
> > In my opinion, we can simply add an optional `manifest` field (or another
> > suitable name). I don’t think we need to introduce a new table via
> > DbManager; an additional field for storing metadata about the external
> > state (such as prefix and object versions for all dags in the bundle, in
> > the case of S3DagBundle) should suffice. We could introduce a new parent
> > subclass, such as `RemoteDagBundle` or `ObjectStoreDagBundle`, in the
> > common provider to define the structure for serializing and deserializing
> > the `manifest` field.
> >
> > 2. If we decide to store the manifest outside the Airflow metadata DB:
> >
> > We will need to clarify:
> >
> > a) The required parameters for all DagBundles that pull DAGs from object
> > storage. Based on the discussion above, we would need the `conn_id`,
> > `bucket`, and `prefix` for the manifest file.
> >
> > b) The interface for calculating the bundle version based on the external
> > state or DAG content hash.
> >
> > Here is a concrete example of how the manifest could be stored:
> > https://github.com/apache/airflow/pull/46621#issuecomment-3078208467
> >
> > Thank you all for the insightful discussion!
> >
> > Best,
> > Jason
> >
> > On 2025/07/10 21:56:31 "Oliveira, Niko" wrote:
> > > Thanks for the reply Jarek :)
> > >
> > > Indeed we have different philosophies about this so we will certainly
> > keep going in circles about where to draw the line on making things easy
> > and enjoyable to use, whether to intentionally add friction or not, etc,
> > etc.
> > >
> > > I think if we have optional paths to take and it's not immensely harder
> > we should err on the side of making OSS Airflow as good as it can be,
> > despite whatever managed services we have in the community. I'm not sure
> > where it has come from recently but this new push to make Airflow
> > intentionally hard to use so that managed services stay in business is a
> > bit unsettling. We're certainly not asking for that, and those around that
> > I've chatted to (since I'm now seeing this mentioned frequently) are also
> > not asking for this. I'm curious where this new pressure is coming from and
> > why you feel it recently.
> > >
> > > But regardless of the curiosity above, I'll return to the drawing board,
> > and see what else can be done for this particular problem. If there are
> > other Bundle types who need to solve the same problem perhaps we can find a
> > more acceptable implementation in Airflow core to support this. And if not,
> > I'll proceed with externalizing the storage of the S3 Bundle version
> > metadata outside of Airflow.
> > >
> > > Cheers,
> > > Niko
> > >
> > > ________________________________
> > > From: Jarek Potiuk <ja...@potiuk.com>
> > > Sent: Wednesday, July 9, 2025 11:59:06 PM
> > > To: dev@airflow.apache.org
> > > Subject: RE: [EXT] S3 Dag Bundle Versions and DB Manager
> > >
> > > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and know
> > the content is safe.
> > >
> > >
> > >
> > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> > externe. Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous
> > ne pouvez pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> > certain que le contenu ne présente aucun risque.
> > >
> > >
> > >
> > > > To me, I'm always working from a user perspective. My goal is to make
> > > their lives easier, their deployments easier, the product the most
> > > enjoyable for them to use. To me, the best user experience is that they
> > > should enable bundle versioning and it should just work with as little or
> > > no extra steps and with as little infra as possible, and with the fewest
> > > possible pit falls for them to fall into. From a user perspective,
> > they've
> > > already provisioned a database for airflow metadata, why is this portion
> > of
> > > metadata leaking out to other forms of external storage? Now this is
> > > another resource they now need to be aware of and manage the lifecycle of
> > > (or allow us write access into their accounts to manage for them).
> > >
> > >
> > > *TL;DR; I think our goal in open-source is to have frictionless and "out
> > of
> > > the box" experience only for basic cases, but not for more complex
> > > deployments.*
> > >
> > > It's a long read if you want to read it .. so beware :).
> > >
> > > I think that is an important "optimization goal" for sure to provide
> > > frictionless and enjoyable experience - but I think it's one of many
> > goals
> > > that are sometimes contradicting with long term open-source project
> > > sustainability and it's very import to clarify which "user" we are
> > talking
> > > about.
> > >
> > > To be honest, I am not sure that our goal should be "airflow should work
> > > out of the box in case of integration with external services in
> > production'
> > > if it complicates our code and makes it service-dependent  - and as Jens
> > > noticed, if we can come up with a "generic" thing that can be reusable
> > > across multiple services, we can invest more in making it works "out of
> > the
> > > box", but if you anyhow need to integrate and make work with external
> > > service, it adds very little "deployment complexity" to use another piece
> > > of the service - and this is basically the job of deployment manager
> > > anyway.
> > >
> > > The "just work" goal as I see it should only cover those individual users
> > > who want to try and use airflow in it's basic form and "standalone"
> > > configuration - not for "deployment managers".
> > >
> > > I think yes - our goal should be to make things extremely easy for users
> > > who want to use airflow in its basic form where things should **just
> > > work**. Like "docker run -it apache/airflow standalone" - this is what
> > > currently **just works**, 0 configuration, 0 work for external
> > > integrations, and we even had a discussion that we could make it "low
> > > production ready" (which I think we could - just implement automated
> > > backup/recovery of sqlite db and maybe document mounting a folder with
> > DAGs
> > > and db, better handling of logs rather than putting them as mixed output
> > on
> > > stdout and we are practically done). But when you add "S3" as the dag
> > > storage you already need to make a lot of decisions - mostly about
> > service
> > > accounts, security, access, versioning, backup of the s3 objects, etc.
> > etc.
> > > And that's not a "standalone user' case - that is a "deployment manager"
> > > work (where "deployment manager" is a role - not necessarily title of the
> > > job you have.
> > >
> > > I think - and that is a bit of philosophical - but I've been talking
> > about
> > > it to Maciek Obuchowski yesterday - that there is a pretty clear boundary
> > > of what open-source solutions delivers and it should match expectations
> > of
> > > people using it. Maintainers and community developing open-source should
> > > mostly deliver a working, generic solutions that are extendable with
> > > various deployment options and we should make it possible for those
> > > deployments to happen - and provide building blocks for them. But it's
> > > "deployment manager" work to make sure to put things together and make it
> > > works. And we should not do it "for them". It's their job to figure out
> > how
> > > to configure and set-up things, make backups, set security boundaries
> > etc.
> > > - we should make it possible, document the options, document security
> > model
> > > and make it "easy" to configure things - but there should not be an
> > > expectation from the deploiyment manager that it "just works".
> > >
> > > And I think your approach is perfectly fine - but only for "managed
> > > services" - there, indeed manage service user's expectations can be that
> > > things "just work" and they are willing to pay for it with real money,
> > > rather than their time and effort to make it so. And there I think, those
> > > who deliver such a service should have the "just work" as primary goal -
> > > also because users will have such expectations - because they actually
> > pay
> > > for it to "just work". Not so much for open-source product - where "just
> > > work" often involves complexity, additional maintenance overhead and
> > making
> > > opinionated decisions on "how it just works". For those "managed service"
> > > teams - "just work" is very much a primary goal.  But for "open source
> > > community" - having such a goal is  actually not good - it's dangerous
> > > because it might result in wrong expectations from the users. If we start
> > > making airflow "just works" in all kinds of deployment with zero work
> > from
> > > the users who want to deploy it in production and at scale, they will
> > > expect it to happen for everything - why don't we have automated log
> > > trimming, why don't we have automated backup of the Database, why don't
> > we
> > > auto vacuum the db, why don't we provide one-click deployment option on
> > > AWS. GCS. Azure, why don't we provide DDOS protection in our webserver,
> > why
> > > don't we ..... you name it.
> > >
> > > That's a bit of philosophy - those are the same assumptions and goals
> > that
> > > I had in mind when designing multi-team - and there it's also why we had
> > > different views - I just feel that some level of friction is a "property"
> > > of open-source product.
> > >
> > > Also a bit of "business" side - this is also "good" for those who provide
> > > managed services and airflow to keep sustainable open-source business
> > model
> > > working - because what people are paying them is precisely to "remove the
> > > friction".  If take the "frictionless user experience" goal case to
> > extreme
> > > - Airflow would essentially be killed IMHO. Imagine if Airflow would be
> > > frictioness for all kinds of deployments and had "everything" working out
> > > of the box. There would be no business for any of the managed services
> > > (because users would not need to pay for it). Then we would only have
> > users
> > > who expect thigns to "just work" and most of them would not even think
> > > about contributing back. And there would be no managed services people
> > > (like you)  whose job is paid by the services - or people like me who
> > work
> > > with and get money from several of those - which would basically slow
> > down
> > > active development and maintenance for Airflow to a halt - because even
> > if
> > > we had a lot of people willing to contribute, maintainers would have very
> > > little - own - time to keep things running. There is a fine balance that
> > we
> > > keep now between the open-source and stakeholders, and open-source
> > product
> > > "friction" is an important property that the balance is built on.
> > >
> > > J.
> > >
> > >
> > > On Wed, Jul 9, 2025 at 9:21 PM Oliveira, Niko
> > <oniko...@amazon.com.invalid>
> > > wrote:
> > >
> > > > To me, I'm always working from a user perspective. My goal is to make
> > > > their lives easier, their deployments easier, the product the most
> > > > enjoyable for them to use. To me, the best user experience is that they
> > > > should enable bundle versioning and it should just work with as little
> > or
> > > > no extra steps and with as little infra as possible, and with the
> > fewest
> > > > possible pit falls for them to fall into. From a user perspective,
> > they've
> > > > already provisioned a database for airflow metadata, why is this
> > portion of
> > > > metadata leaking out to other forms of external storage? Now this is
> > > > another resource they now need to be aware of and manage the lifecycle
> > of
> > > > (or allow us write access into their accounts to manage for them).
> > > >
> > > > Ultimately, we should not be afraid of doing sometimes difficult work
> > to
> > > > make a good product for our users, it's for them in the end :)
> > > >
> > > > However, I see your perspectives as well, making our code and DB
> > > > management more complex is more work and complication for us. And from
> > the
> > > > feedback so far I'm out voted, so I'm happy as always to disagree and
> > > > commit, and do as you wish :)
> > > >
> > > > Thanks for the feedback everyone!
> > > >
> > > > Cheers,
> > > > Niko
> > > >
> > > > ________________________________
> > > > From: Jens Scheffler <j_scheff...@gmx.de.INVALID>
> > > > Sent: Wednesday, July 9, 2025 12:07:08 PM
> > > > To: dev@airflow.apache.org
> > > > Subject: RE: [EXT] S3 Dag Bundle Versions and DB Manager
> > > >
> > > > CAUTION: This email originated from outside of the organization. Do not
> > > > click links or open attachments unless you can confirm the sender and
> > know
> > > > the content is safe.
> > > >
> > > >
> > > >
> > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> > externe.
> > > > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> > pouvez
> > > > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> > que
> > > > le contenu ne présente aucun risque.
> > > >
> > > >
> > > >
> > > > My 2ct on the discussions are similar like the opinions before.
> > > >
> > > >  From my Edge3 experience migrating DB from provider - even if
> > > > technically enabled - is a bit of a pain. Adding a lot of boilerplate,
> > > > you need to consider your provider should also still be compatible with
> > > > AF2 (I assume) and once a user wants to downgrade it is a bit of manual
> > > > effort to downgrade DB as well.
> > > >
> > > > As long as we are not adding a generic Key/Value store to core (similar
> > > > liek Variables but for general purpose internal use not exposed to
> > users
> > > > - but then in case of trougbleshooting how to "manage/admin it?) I
> > would
> > > > also see it like Terraform - a secondary bucked for state os cheap and
> > > > convenient. Yes write access would be needed but only for Airflow. And
> > > > as it is separated from other should not be a general security harm...
> > > > just a small deployment complexity. And I assume versining is optional.
> > > > So no requirement to have it on per default and if a user wants to move
> > > > to/enable versioing then just the state bucket would need to be added
> > to
> > > > Bundle-config?
> > > >
> > > > TLDR I would favor a bucket, else if DB is the choice then a common
> > > > solution in core might be easier than a DB handling in provider. But
> > > > would also not block any other, just from point of complexity I'd not
> > > > favor provider specifc DB tables.
> > > >
> > > > Jens
> > > >
> > > > On 09.07.25 19:57, Jarek Potiuk wrote:
> > > > > What about the DynamoDB idea ? What you are trying to trade-off is
> > > > "writing
> > > > > to airflow metadata DB" with "writing to another DB" really. So yes
> > it
> > > > is -
> > > > > another thing you will need to have access to write to - other than
> > > > Airflow
> > > > > DB, but it's really the question should the boundaries be on
> > "Everything
> > > > > writable should be in Airflow" vs. "Everything writable should be in
> > the
> > > > > "cloud" that the integration is about.
> > > > >
> > > > > Yes - it makes the management using S3 versioning a bit more
> > "write-y" -
> > > > > but on the other hand it does allow to confine complexity to a pure
> > > > > "amazon" provider  - with practically 0 impact on Airflow core and
> > > > airflow
> > > > > DB. Which I really like to be honest.
> > > > >
> > > > > And yes "co-location" is also my goal. And I think this is a perfect
> > way
> > > > to
> > > > > explain it as well why it is better to keep "S3 versioning" close to
> > "S3"
> > > > > and not to Airflow - especially that there will be a lot of
> > "S3-specific"
> > > > > things in the state that are not easy to abstract and have "common"
> > for
> > > > > other Airflow versioning implementations.
> > > > >
> > > > > You can think about it this way:
> > > > >
> > > > > Airflow has already done its job with abstractions - versioning
> > changes
> > > > and
> > > > > metadata DB is implemented in Airflow DB. If there are any missing
> > pieces
> > > > > in the abstraction that will be usable across multiple
> > implementations of
> > > > > versioning, we should - of course - add it to Airflow metadata DB -
> > in
> > > > the
> > > > > way that they can be used by those different implementations. But the
> > > > code
> > > > > to manage and use it should be in airflow-core.
> > > > > If there is anything specific for the implementation of S3 / Amazon
> > > > > integration -> it should be implemented independently from Airflow
> > > > Metadata
> > > > > DB. There are many complexities in managing and upgrading core DB
> > and we
> > > > > should not use the db to make provider-specific things. The
> > discussion
> > > > > about shared code and isolation is interesting in this context.
> > Because I
> > > > > think we might get to the point when we go deeper and deeper in this
> > > > > direction that we will have (and we already do it more or less) NO
> > > > > (regular) providers needed with whatever CLI or tooling we will need
> > to
> > > > > manage the Metadata DB. FAB and Edge are currently exceptions - but
> > they
> > > > > are by no means "regular" providers.
> > > > >
> > > > > So I'd say - if while designing/ implementing S3 versioning you will
> > see
> > > > > that part of the implementation can be abstracted away and added to
> > the
> > > > > core and used by other implementations - 100% - let's add it to the
> > core.
> > > > > But only then. If it is something that only Amazon provider needs
> > and S3
> > > > > needs - let's make it use Amazon **whatever** as backing storage.
> > > > >
> > > > > I would even say - talk to the Google team and try to come up with an
> > > > > abstraction that can be used for versioning in both S3 and GCS,
> > agree on
> > > > > it, and let's see if this abstraction should find its way to the
> > core.
> > > > That
> > > > > would be my proposal.
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 9, 2025 at 7:37 PM Oliveira, Niko
> > > > <oniko...@amazon.com.invalid>
> > > > > wrote:
> > > > >
> > > > >> Thanks for engaging folks!
> > > > >>
> > > > >> I don’t love the idea of using another bucket. For one, this means
> > > > Airflow
> > > > >> needs write access to S3 which is not ideal; some users/customers
> > are
> > > > very
> > > > >> sensitive about ever allowing write access to things. And two, you
> > will
> > > > >> commonly get issues with a design that leaks state into customer
> > managed
> > > > >> accounts/resources, they may delete the bucket not knowing what it
> > is,
> > > > they
> > > > >> may not migrate it to a new account or region if they ever move. I
> > think
> > > > >> it’s best for the data to be stored transparently to the user and
> > > > >> co-located with the data it strongly relates to (i.e. the dag runs
> > that
> > > > are
> > > > >> associated with those bundle versions).
> > > > >>
> > > > >> Is using DB Manager completely unacceptable these days? What are
> > folks'
> > > > >> thoughts on that?
> > > > >>
> > > > >> Cheers,
> > > > >> Niko
> > > > >>
> > > > >> ________________________________
> > > > >> From: Jarek Potiuk <ja...@potiuk.com>
> > > > >> Sent: Wednesday, July 9, 2025 6:23:54 AM
> > > > >> To: dev@airflow.apache.org
> > > > >> Subject: RE: [EXT] S3 Dag Bundle Versions and DB Manager
> > > > >>
> > > > >> CAUTION: This email originated from outside of the organization. Do
> > not
> > > > >> click links or open attachments unless you can confirm the sender
> > and
> > > > know
> > > > >> the content is safe.
> > > > >>
> > > > >>
> > > > >>
> > > > >> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur
> > > > externe.
> > > > >> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> > > > pouvez
> > > > >> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas
> > certain
> > > > que
> > > > >> le contenu ne présente aucun risque.
> > > > >>
> > > > >>
> > > > >>
> > > > >>> Another option also would be Using dynamodb table? that also
> > supports
> > > > >> snapshots and i feel it works very well with state management.
> > > > >>
> > > > >> Yep that would also work.
> > > > >>
> > > > >> Anything "Amazon" to keep state would do. I think that it should be
> > our
> > > > >> "default" approach that if we have to keep state and the state is
> > > > connected
> > > > >> with specific "provider's" implementation, it's best to not keep
> > state
> > > > in
> > > > >> Airflow, but in the "integration" that the provider works with if
> > > > possible.
> > > > >> We cannot do it in "generic" case because we do not know what
> > > > >> "integrations" the user has - but since this is "provider's"
> > > > functionality,
> > > > >> using anything else that the given integration provides makes
> > perfect
> > > > >> sense.
> > > > >>
> > > > >> J.
> > > > >>
> > > > >>
> > > > >> On Wed, Jul 9, 2025 at 3:12 PM Pavankumar Gopidesu <
> > > > >> gopidesupa...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Agree another s3 bucket also works here
> > > > >>>
> > > > >>> Another option also would be Using dynamodb table? that also
> > supports
> > > > >>> snapshots and i feel it works very well with state management.
> > > > >>>
> > > > >>>
> > > > >>> Pavan
> > > > >>>
> > > > >>> On Wed, Jul 9, 2025 at 2:06 PM Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > >>>
> > > > >>>> One of the options would be to use a similar approach as terraform
> > > > >> uses -
> > > > >>>> i.e. use dedicated "metadata" state storage in a DIFFERENT s3
> > bucket
> > > > >> than
> > > > >>>> DAG files. Since we know there must be an S3 available
> > (obviously) -
> > > > it
> > > > >>>> seems not too excessive to assume that there might be another
> > bucket,
> > > > >>>> independent of the DAG bucket where the state is stored - same
> > bucket
> > > > >>> (and
> > > > >>>> dedicated connection id) could even be used to store state for
> > > > multiple
> > > > >>> S3
> > > > >>>> dag bundles - each Dag bundle could have a dedicated object
> > storing
> > > > the
> > > > >>>> state. The metadata is not huge, so continuously reading and
> > replacing
> > > > >> it
> > > > >>>> should not be an issue.
> > > > >>>>
> > > > >>>>   What's nice about it - this single object could even
> > **actually**
> > > > use
> > > > >> S3
> > > > >>>> versioning to keep historical state  - to optimize things and
> > keep a
> > > > >> log
> > > > >>> of
> > > > >>>> changes potentially.
> > > > >>>>
> > > > >>>> J.
> > > > >>>>
> > > > >>>> On Wed, Jul 9, 2025 at 3:01 AM Oliveira, Niko
> > > > >>> <oniko...@amazon.com.invalid
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hey folks,
> > > > >>>>>
> > > > >>>>> tl;dr I’d like to get some thoughts on a proposal to use DB
> > Manager
> > > > >> for
> > > > >>>> S3
> > > > >>>>> Dag Bundle versioning.
> > > > >>>>>
> > > > >>>>> The initial commit for S3 Dag Bundles was recently merged [1]
> > but it
> > > > >>>> lacks
> > > > >>>>> Bundle versioning (since this isn’t trivial with something like
> > S3,
> > > > >>> like
> > > > >>>> it
> > > > >>>>> is with Git). The proposed solution involves building a snapshot
> > of
> > > > >> the
> > > > >>>> S3
> > > > >>>>> bucket at the time each Bundle version is created, noting the
> > version
> > > > >>> of
> > > > >>>>> all the objects in the bucket (using S3’s native bucket
> > versioning
> > > > >>>> feature)
> > > > >>>>> and creating a manifest to store those versions and then giving
> > that
> > > > >>>> whole
> > > > >>>>> manifest itself some unique id/version/uuid. These manifests now
> > need
> > > > >>> to
> > > > >>>> be
> > > > >>>>> stored somewhere for future use/retrieval. The proposal is to
> > use the
> > > > >>>>> Airflow database using the DB Manager feature. Other options
> > include
> > > > >>>> using
> > > > >>>>> the local filesystem to store them (but this obviously wont work
> > in
> > > > >>>>> Airflow’s distributed architecture) or the S3 bucket itself (but
> > this
> > > > >>>>> requires write access to the bucket and we will always be at the
> > > > >> mercy
> > > > >>> of
> > > > >>>>> the user accidentally deleting/modifying the manifests as they
> > try to
> > > > >>>>> manage the lifecycle of their bucket, they should not need to be
> > > > >> aware
> > > > >>> of
> > > > >>>>> or need to account for this metadata). So the Airflow DB works
> > nicely
> > > > >>> as
> > > > >>>> a
> > > > >>>>> persistent and internally accessible location for this data.
> > > > >>>>>
> > > > >>>>> But I’m aware of the complexities of using the DB Manager and the
> > > > >>>>> discussion we had during the last dev call about providers
> > vending DB
> > > > >>>>> tables (concerning migrations and ensuring smooth upgrades or
> > > > >>> downgrades
> > > > >>>> of
> > > > >>>>> the schema). So I wanted to reach out to see what folks thought.
> > I
> > > > >> have
> > > > >>>>> talked to Jed, the Bundle Master (tm), and we haven’t come up
> > with
> > > > >>>> anything
> > > > >>>>> else that solves the problem as cleanly, so the DB Manager is
> > still
> > > > >> my
> > > > >>>> top
> > > > >>>>> choice. I think what we go with will pave the way for other
> > Bundle
> > > > >>>>> providers of a similar type as well, so it's worth thinking
> > deeply
> > > > >>> about
> > > > >>>>> this decision.
> > > > >>>>>
> > > > >>>>> Let me know what you think and thanks for your time!
> > > > >>>>>
> > > > >>>>> Cheers,
> > > > >>>>> Niko
> > > > >>>>>
> > > > >>>>> [1] https://github.com/apache/airflow/pull/46621
> > > > >>>>>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Re: S3 Dag Bundle Versions and DB Manager

Reply via email to