Maybe a stupid question but why not make SequentialExecutor extend 
LocalExecutor with parallelism set to one as you described the similarity?

Then you're still backward compatible (for those who would use it anyway), you 
get rid of the SequentialExecutor specific code but you still have to 
possibility to use it? Or I'm missing something?

Kind regards,
David

-----Original Message-----
From: Ash Berlin-Taylor <a...@apache.org>
Sent: Tuesday, 17 December 2024 18:33
To: dev@airflow.apache.org
Subject: Re: [DISCUSS] Make Sqlite3 "low-prod-ready" and get rid of the 
Sequential Executor

EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze 
e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>.

What Jens said.

I think sqlite will be a valid path forward for `airflow standalone` adn the 
SequentialExecutor could almost be silently upgraded/replaced with the 
LocalExecutor, but in terms of priorities for 3.0 release it's certainly not 
one of mine.

So, yes, but I don't have cycles to focus on it :)

-ash

> On 17 Dec 2024, at 15:48, Jens Scheffler <j_scheff...@gmx.de.INVALID> wrote:
>
> HI All,
>
> I'v very much favor such cleanup. Mainly getting rid of sequential
> executor and some flags.
>
> The intend to make it "low production ready" smells dangerous for me as
> this would assume production stability. Which I'd recommend rather to go
> with Postgres. Maybe it could develop into this direction but the
> promise is tooo big atm.
>
> But positively speaking it really could enable "airflow standalone"
> being more like a first class citizen and would allow a much easier
> enable single docker/machine development and debug environment and would
> lower the footprint very much to (DAG but not limited to) developers.
>
> But seeing the stuff we have in front of us for 3.0, I'd propose to
> focus on 3.0 first, if there is spare time then we can make it for 3.0,
> but also w/o any breaking changes I think we can also make it for 3.1
> (if we maybe deprecate SequentialExecutor early that after a 3.0 we are
> "OK" to remove it.
>
> Jens
>
> P.S.: At the moment there are a couple of feature flags but actually for
> me SequentialExecutor == LocalExecutor(paralellism=1)
>
> On 17.12.24 13:29, Jarek Potiuk wrote:
>> Hello here,
>>
>> TL;DR; Recently Ash created and merged this PR
>> https://github.com/apache/airflow/pull/44839
>> "Remove 'single process' restrictions on SQLite in favour of using WAL
>> mode" and I think it opens up an interesting possibility - to make SQLite a
>> "low production ready" database.
>>
>> With this change, some of the limitations of SQLite integration for Airflow
>> have been removed (multi-process access). With Airflow 3 and moving DB
>> access out from Tasks, we are getting into the situation that all the DB
>> access will be concentrated in the "central" place - webserver, scheduler,
>> triggerer, dag processor , task api - and with WAL, it seems that all those
>> **could** access sqlite database locally if they are run on a single
>> machine - while with things like "edge executor" the tasks could run
>> elsewhere (or also on the same machine - with Local Executor).
>>
>> One thing that it enables - we could simply remove SequentialExecutor. IMHO
>> the only reason why it continued to exist was the case with SQLIte (and
>> even there for quite some time sqlite could work with LocalExecutor with
>> parallelism = 1). There is also a "debuggability" thing - possibly - but
>> with `airflow dag test` - I think Sequential Executor has no longer an
>> advantage there. And we could make LocalExecutor with n = num available
>> processors (maybe - 2 or -3) as default airflow setting - which would
>> mitigate some of the "first-time" experience of people who see that Airflow
>> is "slow" (with sequential executor it is). And we could get rid of the
>> pesky "Do not use sequential executor in production" warning and simplify
>> the Executor interface (now executor has a special `is_production`
>> flag/mode).
>>
>> But there is more.
>>
>> If we add to it "airflow standalone" and some ways (even just instructions
>> or guidelines) for the users how to back-up, possibly compact and maintain
>> sqlite database, I don't think we are far away from announcing the Sqlite
>> DB as "low production ready". SQLite is a "real" database, for many years
>> it's used in production in many, many products and I would say - we have
>> far less problems with sqlite than we have with MySQL - in our CI for
>> example. And if we combine it with "airflow standalone" -  I think we
>> **could** say "If you want to run Airflow on one machine, without bit
>> expectations about scalability - Airflow 3 + Sqlite is a **GOOD**
>> production choice"
>>
>> Likely we would have to test it a bit more, and do some documentation
>> around, but I think that could alleviate a lot of concerns and address a
>> bit of a "drawback" people have around Airflow that it is "difficult to
>> start with". Currently when you try airflow - you have all the warnings
>> "don't use this setup - it's only suitable to play with airflow" - but I
>> think we are not too far to say this:
>>
>>
>> 1) run pip install airflow[google,amazon,cohere]==3.0.0
>> 2) run "airflow standalone" in whatever way you think is best to manage
>> restarts
>> 3) -> that's it. you have very low-scale, production-ready airflow up and
>> running
>>
>> Especially if we document and figure out some of the limitations, when
>> people should consider switching to more "higher production" settings with
>> MySQL, Postgres and maybe give them tools to do so - that could also be a
>> very nice come-back to the original success story of Airflow  - where data
>> engineers were really installing airflow on their own to make their life
>> easier, and after some time their companies had to adopt them and install
>> Airflow or migrate to managed version at scale - kind of driving Airflow
>> adoption from the "bottom".
>>
>> I think the investment to make "standalone airflow with sqlite3"
>> low-production-ready is relatively small, but being able to openly say -
>> "it's actually SUPER EASY to run airflow for small setup" - is a very
>> powerful selling point of Airflow 3 potentially.
>>
>> But - of course - maybe there are some limitations of Sqlite that I am not
>> aware of. Ash mentioned in his PR: "Will this be without problems? No, not
>> entirely,"  - and yeah, likely it has some limitations and constraints, but
>> maybe they are not as big, and maybe we **could** commit as a community to
>> support Sqlite3 as "good" to use for really small installations.
>>
>> WDYT?
>>
>> J.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to