Big +1 from my side, looking forward to make this happen. Two sides that aren't completely clear to me:
- Are we going to extend the existing data model, to allow the RDBMS to optimize queries on fields that we use a lot? - How are we going to do state evolution when we extend the JSON model I have good confidence that we'll solve this along the way. Cheers, Fokko Op di 15 okt. 2019 om 21:29 schreef Dan Davydov <ddavy...@twitter.com.invalid>: > I have been following it from the beginning as well. I understand there > would be short-term wins for some users (I don't think a huge amount of > users?), but I still feel like we are being a bit short-sighted here and > that we are creating more work for ourselves and potentially our users in > the future. I also feel like there will be side effects to users as well, > many of which don't care about the webserver scalability, such as bugs > caused by the addition of the new webserver representation. I think without > a design that is much larger in scope I wouldn't feel comfortable moving > forward with this AIP. > > On Tue, Oct 15, 2019 at 3:21 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > Hello Dan, Alex, > > > > I believe all the points you make are super-valid ones. But maybe you are > > missing the full context a bit. > > > > I followed the original discussion > > < > > > https://lists.apache.org/thread.html/a2d426f93c0f4e5f0347627308638b59ca4072fd022a42af1163e34a@%3Cdev.airflow.apache.org%3E > > > > > from the very beginning and took part in the initial discussions when > this > > topic was raised. From the discussion it is quite clear to me that this > > mostly a "tactical" approach to implement something that is backportable > to > > 1.10 and rather quick to implement. This is targeted to make users more > > happy with their 1.10 version without the timing uncertainty and effort > of > > migration to 2.0. It solves the major pain point of stability of the UI > in > > case there are complex DAGs for which parsing crashes the webserver. > > Like in "being nice to your users". > > > > There will be a separate effort to make pretty much all of the things you > > mentioned in 2.0 in a non-backportable way as it requires far too many > > changes in the way how Airflow works internally. > > > > Maybe it needs some more explanation + long term plan that follows in the > > AIP itself to explain it to those who have not followed the initial > > discussion, but I think it's fully justified change. > > > > J. > > > > On Tue, Oct 15, 2019 at 9:10 PM Alex Guziel <alex.guz...@airbnb.com > > .invalid> > > wrote: > > > > > -1 (binding) > > > Good points made by Dan. We don't need to have the future plan > > implemented > > > completely but it would be nice to see more detailed notes about how > this > > > will play out in the future. We shouldn't walk into a system that > causes > > > more pain in the future. (I can't say for sure that it does, but I > can't > > > say that it doesn't either). I don't think the proposal is necessarily > > > wrong or bad, but I think we need some more detailed planning around > > future > > > milestones. > > > > > > On Tue, Oct 15, 2019 at 12:04 PM Dan Davydov > > <ddavy...@twitter.com.invalid > > > > > > > wrote: > > > > > > > -1 (binding), this may sound a bit FUD-y but I don't feel this has > been > > > > thought through enough... > > > > > > > > Having both a SimpleDagBag representation and the JSON representation > > > > doesn't make sense to me at the moment: *"**Quoting from Airflow > code, > > it > > > > is “a simplified representation of a DAG that contains all attributes > > > > required for instantiating and scheduling its associated tasks.”. It > > does > > > > not contain enough information required by the webserver.". *Why not > > > create > > > > a representation that can be used by both? This is going to be a big > > > > headache to both understand and work with in the codebase since it > will > > > be > > > > another definition that we need to keep in sync. > > > > > > > > Not sure if fileloc/fileloc_hash is the right solution, the longterm > > > > solution I am imagining has clients responsible for uploading DAGs > > rather > > > > than retrieving them from the filesystem so fileloc/fileloc_hash > > wouldn't > > > > even exist (dag_id would be used for indexing here). > > > > > > > > Versioning isn't really addressed either (e.g. if a DAG topology > > changes > > > > with some update you want to be able to show both the old and new > ones, > > > or > > > > at least have a way to deal with them), there is an argument that > this > > is > > > > acceptable since it isn't currently addressed now, but I'm worried > that > > > > large schema changes should think through the long term plan a bit > > more. > > > > > > > > I feel like getting this wrong is going to make it very hard to > migrate > > > > things in the future, and make the codebase worse (another > > representation > > > > of DAGs that we need to support/understand/keep parity for). If I'm > > wrong > > > > about this then I would be more willing to +1 this change. This doc > is > > a > > > > 1-2 pager and I feel like it is not thorough or deep enough and > doesn't > > > > give me enough confidence that the work in the PR is going to make it > > > > easier to complete the future milestones instead of harder. > > > > > > > > On Tue, Oct 15, 2019 at 11:26 AM Kamil Breguła < > > > kamil.breg...@polidea.com> > > > > wrote: > > > > > > > > > +1 (binding) > > > > > > > > > > On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > Hello, Airflow community, > > > > > > > > > > > > This email calls for a vote to add the DAG Serialization feature > at > > > > > > https://github.com/apache/airflow/pull/5743. > > > > > > > > > > > > *AIP*: > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler > > > > > > > > > > > > *Previous Mailing List discussion*: > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E > > > > > > . > > > > > > > > > > > > *Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor > > > > > > > > > > > > *Summary*: > > > > > > > > > > > > - DAGs are serialized using JSON format and stored in a > > > > SerializedDag > > > > > > table > > > > > > - The Webserver now instead of having to parse the DAG file > > again, > > > > > > reads the serialized DAGs in JSON, de-serializes them and > > creates > > > > the > > > > > > DagBag and uses it to show in the UI. > > > > > > - Instead of loading an entire DagBag when the WebServer > starts > > we > > > > > > only load each DAG on demand from the Serialized Dag table. > This > > > > helps > > > > > > reduce Webserver startup time and memory. The reduction is > > notable > > > > > when you > > > > > > have a large number of DAGs. > > > > > > - A JSON Schema has been defined and we validate the > serialized > > > dag > > > > > > before writing it to the database > > > > > > > > > > > > [image: image.png] > > > > > > > > > > > > A PR (https://github.com/apache/airflow/pull/5743) is ready for > > > review > > > > > > from the committers and community. > > > > > > > > > > > > We also have a WIP PR ( > https://github.com/apache/airflow/pull/5992 > > ) > > > to > > > > > > backport this feature to 1.10.* branch. > > > > > > > > > > > > A big thank you to Zhou and Ash for their continuous help in > > > improving > > > > > > this feature/PR. > > > > > > > > > > > > This email is formally calling for a vote to accept the AIP and > PR. > > > > > Please > > > > > > note that we will update the PR / feature to fix bugs if we find > > any. > > > > > > > > > > > > Cheers, > > > > > > Kaxil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > >