Big +1 from my side, looking forward to make this happen.

Two sides that aren't completely clear to me:

   - Are we going to extend the existing data model, to allow the RDBMS to
   optimize queries on fields that we use a lot?
   - How are we going to do state evolution when we extend the JSON model

I have good confidence that we'll solve this along the way.

Cheers, Fokko

Op di 15 okt. 2019 om 21:29 schreef Dan Davydov
<ddavy...@twitter.com.invalid>:

> I have been following it from the beginning as well. I understand there
> would be short-term wins for some users (I don't think a huge amount of
> users?), but I still feel like we are being a bit short-sighted here and
> that we are creating more work for ourselves and potentially our users in
> the future. I also feel like there will be side effects to users as well,
> many of which don't care about the webserver scalability, such as bugs
> caused by the addition of the new webserver representation. I think without
> a design that is much larger in scope I wouldn't feel comfortable moving
> forward with this AIP.
>
> On Tue, Oct 15, 2019 at 3:21 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > Hello Dan, Alex,
> >
> > I believe all the points you make are super-valid ones. But maybe you are
> > missing the full context a bit.
> >
> > I followed the original discussion
> > <
> >
> https://lists.apache.org/thread.html/a2d426f93c0f4e5f0347627308638b59ca4072fd022a42af1163e34a@%3Cdev.airflow.apache.org%3E
> > >
> > from the very beginning and took part in the initial discussions when
> this
> > topic was raised. From the discussion it is quite clear to me that this
> > mostly a "tactical" approach to implement something that is backportable
> to
> > 1.10 and rather quick to implement. This is targeted to make users more
> > happy with their 1.10 version without the timing uncertainty and effort
> of
> > migration to 2.0. It solves the major pain point of stability of the UI
> in
> > case there are complex DAGs for which parsing crashes the webserver.
> > Like in "being nice to your users".
> >
> > There will be a separate effort to make pretty much all of the things you
> > mentioned in 2.0 in a non-backportable way as it requires far too many
> > changes in the way how Airflow works internally.
> >
> > Maybe it needs some more explanation + long term plan that follows in the
> > AIP itself to explain it to those who have not followed the initial
> > discussion, but I think it's fully justified change.
> >
> > J.
> >
> > On Tue, Oct 15, 2019 at 9:10 PM Alex Guziel <alex.guz...@airbnb.com
> > .invalid>
> > wrote:
> >
> > > -1 (binding)
> > > Good points made by Dan. We don't need to have the future plan
> > implemented
> > > completely but it would be nice to see more detailed notes about how
> this
> > > will play out in the future. We shouldn't walk into a system that
> causes
> > > more pain in the future. (I can't say for sure that it does, but I
> can't
> > > say that it doesn't either). I don't think the proposal is necessarily
> > > wrong or bad, but I think we need some more detailed planning around
> > future
> > > milestones.
> > >
> > > On Tue, Oct 15, 2019 at 12:04 PM Dan Davydov
> > <ddavy...@twitter.com.invalid
> > > >
> > > wrote:
> > >
> > > > -1 (binding), this may sound a bit FUD-y but I don't feel this has
> been
> > > > thought through enough...
> > > >
> > > > Having both a SimpleDagBag representation and the JSON representation
> > > > doesn't make sense to me at the moment: *"**Quoting from Airflow
> code,
> > it
> > > > is “a simplified representation of a DAG that contains all attributes
> > > > required for instantiating and scheduling its associated tasks.”. It
> > does
> > > > not contain enough information required by the webserver.". *Why not
> > > create
> > > > a representation that can be used by both? This is going to be a big
> > > > headache to both understand and work with in the codebase since it
> will
> > > be
> > > > another definition that we need to keep in sync.
> > > >
> > > > Not sure if fileloc/fileloc_hash is the right solution, the longterm
> > > > solution I am imagining has clients responsible for uploading DAGs
> > rather
> > > > than retrieving them from the filesystem so fileloc/fileloc_hash
> > wouldn't
> > > > even exist (dag_id would be used for indexing here).
> > > >
> > > > Versioning isn't really addressed either (e.g. if a DAG topology
> > changes
> > > > with some update you want to be able to show both the old and new
> ones,
> > > or
> > > > at least have a way to deal with them), there is an argument that
> this
> > is
> > > > acceptable since it isn't currently addressed now, but I'm worried
> that
> > > > large schema changes should think through the long term plan a bit
> > more.
> > > >
> > > > I feel like getting this wrong is going to make it very hard to
> migrate
> > > > things in the future, and make the codebase worse (another
> > representation
> > > > of DAGs that we need to support/understand/keep parity for). If I'm
> > wrong
> > > > about this then I would be more willing to +1 this change. This doc
> is
> > a
> > > > 1-2 pager and I feel like it is not thorough or deep enough and
> doesn't
> > > > give me enough confidence that the work in the PR is going to make it
> > > > easier to complete the future milestones instead of harder.
> > > >
> > > > On Tue, Oct 15, 2019 at 11:26 AM Kamil Breguła <
> > > kamil.breg...@polidea.com>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hello, Airflow community,
> > > > > >
> > > > > > This email calls for a vote to add the DAG Serialization feature
> at
> > > > > > https://github.com/apache/airflow/pull/5743.
> > > > > >
> > > > > > *AIP*:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
> > > > > >
> > > > > > *Previous Mailing List discussion*:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E
> > > > > >  .
> > > > > >
> > > > > > *Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor
> > > > > >
> > > > > > *Summary*:
> > > > > >
> > > > > >    - DAGs are serialized using JSON format and stored in a
> > > > SerializedDag
> > > > > >    table
> > > > > >    - The Webserver now instead of having to parse the DAG file
> > again,
> > > > > >    reads the serialized DAGs in JSON, de-serializes them and
> > creates
> > > > the
> > > > > >    DagBag and uses it to show in the UI.
> > > > > >    - Instead of loading an entire DagBag when the WebServer
> starts
> > we
> > > > > >    only load each DAG on demand from the Serialized Dag table.
> This
> > > > helps
> > > > > >    reduce Webserver startup time and memory. The reduction is
> > notable
> > > > > when you
> > > > > >    have a large number of DAGs.
> > > > > >    - A JSON Schema has been defined and we validate the
> serialized
> > > dag
> > > > > >    before writing it to the database
> > > > > >
> > > > > > [image: image.png]
> > > > > >
> > > > > > A PR (https://github.com/apache/airflow/pull/5743) is ready for
> > > review
> > > > > > from the committers and community.
> > > > > >
> > > > > > We also have a WIP PR (
> https://github.com/apache/airflow/pull/5992
> > )
> > > to
> > > > > > backport this feature to 1.10.* branch.
> > > > > >
> > > > > > A big thank you to Zhou and Ash for their continuous help in
> > > improving
> > > > > > this feature/PR.
> > > > > >
> > > > > > This email is formally calling for a vote to accept the AIP and
> PR.
> > > > > Please
> > > > > > note that we will update the PR / feature to fix bugs if we find
> > any.
> > > > > >
> > > > > > Cheers,
> > > > > > Kaxil
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Reply via email to