Re: [DISCUSS] Make Sqlite3 "low-prod-ready" and get rid of the Sequential Executor

Daniel Standish Tue, 17 Dec 2024 17:20:22 -0800

>
> Maybe a stupid question but why not make SequentialExecutor extend
> LocalExecutor with parallelism set to one as you described the similarity?



I think the answer to this is, there is not much point in keeping a
sequential executor class around if it's just localexecutor with
parallelism 1.  Just use local executor with parallelism 1 if that's what
you want; don't need a class for that.




On Tue, Dec 17, 2024 at 11:18 AM Blain David <[email protected]>
wrote:

> Maybe a stupid question but why not make SequentialExecutor extend
> LocalExecutor with parallelism set to one as you described the similarity?
>
> Then you're still backward compatible (for those who would use it anyway),
> you get rid of the SequentialExecutor specific code but you still have to
> possibility to use it? Or I'm missing something?
>
> Kind regards,
> David
>
> -----Original Message-----
> From: Ash Berlin-Taylor <[email protected]>
> Sent: Tuesday, 17 December 2024 18:33
> To: [email protected]
> Subject: Re: [DISCUSS] Make Sqlite3 "low-prod-ready" and get rid of the
> Sequential Executor
>
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel,
> stuur deze e-mail als bijlage naar [email protected]<mailto:
> [email protected]>.
>
> What Jens said.
>
> I think sqlite will be a valid path forward for `airflow standalone` adn
> the SequentialExecutor could almost be silently upgraded/replaced with the
> LocalExecutor, but in terms of priorities for 3.0 release it's certainly
> not one of mine.
>
> So, yes, but I don't have cycles to focus on it :)
>
> -ash
>
> > On 17 Dec 2024, at 15:48, Jens Scheffler <[email protected]>
> wrote:
> >
> > HI All,
> >
> > I'v very much favor such cleanup. Mainly getting rid of sequential
> > executor and some flags.
> >
> > The intend to make it "low production ready" smells dangerous for me as
> > this would assume production stability. Which I'd recommend rather to go
> > with Postgres. Maybe it could develop into this direction but the
> > promise is tooo big atm.
> >
> > But positively speaking it really could enable "airflow standalone"
> > being more like a first class citizen and would allow a much easier
> > enable single docker/machine development and debug environment and would
> > lower the footprint very much to (DAG but not limited to) developers.
> >
> > But seeing the stuff we have in front of us for 3.0, I'd propose to
> > focus on 3.0 first, if there is spare time then we can make it for 3.0,
> > but also w/o any breaking changes I think we can also make it for 3.1
> > (if we maybe deprecate SequentialExecutor early that after a 3.0 we are
> > "OK" to remove it.
> >
> > Jens
> >
> > P.S.: At the moment there are a couple of feature flags but actually for
> > me SequentialExecutor == LocalExecutor(paralellism=1)
> >
> > On 17.12.24 13:29, Jarek Potiuk wrote:
> >> Hello here,
> >>
> >> TL;DR; Recently Ash created and merged this PR
> >> https://github.com/apache/airflow/pull/44839
> >> "Remove 'single process' restrictions on SQLite in favour of using WAL
> >> mode" and I think it opens up an interesting possibility - to make
> SQLite a
> >> "low production ready" database.
> >>
> >> With this change, some of the limitations of SQLite integration for
> Airflow
> >> have been removed (multi-process access). With Airflow 3 and moving DB
> >> access out from Tasks, we are getting into the situation that all the DB
> >> access will be concentrated in the "central" place - webserver,
> scheduler,
> >> triggerer, dag processor , task api - and with WAL, it seems that all
> those
> >> **could** access sqlite database locally if they are run on a single
> >> machine - while with things like "edge executor" the tasks could run
> >> elsewhere (or also on the same machine - with Local Executor).
> >>
> >> One thing that it enables - we could simply remove SequentialExecutor.
> IMHO
> >> the only reason why it continued to exist was the case with SQLIte (and
> >> even there for quite some time sqlite could work with LocalExecutor with
> >> parallelism = 1). There is also a "debuggability" thing - possibly - but
> >> with `airflow dag test` - I think Sequential Executor has no longer an
> >> advantage there. And we could make LocalExecutor with n = num available
> >> processors (maybe - 2 or -3) as default airflow setting - which would
> >> mitigate some of the "first-time" experience of people who see that
> Airflow
> >> is "slow" (with sequential executor it is). And we could get rid of the
> >> pesky "Do not use sequential executor in production" warning and
> simplify
> >> the Executor interface (now executor has a special `is_production`
> >> flag/mode).
> >>
> >> But there is more.
> >>
> >> If we add to it "airflow standalone" and some ways (even just
> instructions
> >> or guidelines) for the users how to back-up, possibly compact and
> maintain
> >> sqlite database, I don't think we are far away from announcing the
> Sqlite
> >> DB as "low production ready". SQLite is a "real" database, for many
> years
> >> it's used in production in many, many products and I would say - we have
> >> far less problems with sqlite than we have with MySQL - in our CI for
> >> example. And if we combine it with "airflow standalone" -  I think we
> >> **could** say "If you want to run Airflow on one machine, without bit
> >> expectations about scalability - Airflow 3 + Sqlite is a **GOOD**
> >> production choice"
> >>
> >> Likely we would have to test it a bit more, and do some documentation
> >> around, but I think that could alleviate a lot of concerns and address a
> >> bit of a "drawback" people have around Airflow that it is "difficult to
> >> start with". Currently when you try airflow - you have all the warnings
> >> "don't use this setup - it's only suitable to play with airflow" - but I
> >> think we are not too far to say this:
> >>
> >>
> >> 1) run pip install airflow[google,amazon,cohere]==3.0.0
> >> 2) run "airflow standalone" in whatever way you think is best to manage
> >> restarts
> >> 3) -> that's it. you have very low-scale, production-ready airflow up
> and
> >> running
> >>
> >> Especially if we document and figure out some of the limitations, when
> >> people should consider switching to more "higher production" settings
> with
> >> MySQL, Postgres and maybe give them tools to do so - that could also be
> a
> >> very nice come-back to the original success story of Airflow  - where
> data
> >> engineers were really installing airflow on their own to make their life
> >> easier, and after some time their companies had to adopt them and
> install
> >> Airflow or migrate to managed version at scale - kind of driving Airflow
> >> adoption from the "bottom".
> >>
> >> I think the investment to make "standalone airflow with sqlite3"
> >> low-production-ready is relatively small, but being able to openly say -
> >> "it's actually SUPER EASY to run airflow for small setup" - is a very
> >> powerful selling point of Airflow 3 potentially.
> >>
> >> But - of course - maybe there are some limitations of Sqlite that I am
> not
> >> aware of. Ash mentioned in his PR: "Will this be without problems? No,
> not
> >> entirely,"  - and yeah, likely it has some limitations and constraints,
> but
> >> maybe they are not as big, and maybe we **could** commit as a community
> to
> >> support Sqlite3 as "good" to use for really small installations.
> >>
> >> WDYT?
> >>
> >> J.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [DISCUSS] Make Sqlite3 "low-prod-ready" and get rid of the Sequential Executor

Reply via email to