What Jens said.

I think sqlite will be a valid path forward for `airflow standalone` adn the 
SequentialExecutor could almost be silently upgraded/replaced with the 
LocalExecutor, but in terms of priorities for 3.0 release it’s certainly not 
one of mine.

So, yes, but I don’t have cycles to focus on it :)

-ash

> On 17 Dec 2024, at 15:48, Jens Scheffler <j_scheff...@gmx.de.INVALID> wrote:
> 
> HI All,
> 
> I'v very much favor such cleanup. Mainly getting rid of sequential
> executor and some flags.
> 
> The intend to make it "low production ready" smells dangerous for me as
> this would assume production stability. Which I'd recommend rather to go
> with Postgres. Maybe it could develop into this direction but the
> promise is tooo big atm.
> 
> But positively speaking it really could enable "airflow standalone"
> being more like a first class citizen and would allow a much easier
> enable single docker/machine development and debug environment and would
> lower the footprint very much to (DAG but not limited to) developers.
> 
> But seeing the stuff we have in front of us for 3.0, I'd propose to
> focus on 3.0 first, if there is spare time then we can make it for 3.0,
> but also w/o any breaking changes I think we can also make it for 3.1
> (if we maybe deprecate SequentialExecutor early that after a 3.0 we are
> "OK" to remove it.
> 
> Jens
> 
> P.S.: At the moment there are a couple of feature flags but actually for
> me SequentialExecutor == LocalExecutor(paralellism=1)
> 
> On 17.12.24 13:29, Jarek Potiuk wrote:
>> Hello here,
>> 
>> TL;DR; Recently Ash created and merged this PR
>> https://github.com/apache/airflow/pull/44839
>> "Remove 'single process' restrictions on SQLite in favour of using WAL
>> mode" and I think it opens up an interesting possibility - to make SQLite a
>> "low production ready" database.
>> 
>> With this change, some of the limitations of SQLite integration for Airflow
>> have been removed (multi-process access). With Airflow 3 and moving DB
>> access out from Tasks, we are getting into the situation that all the DB
>> access will be concentrated in the "central" place - webserver, scheduler,
>> triggerer, dag processor , task api - and with WAL, it seems that all those
>> **could** access sqlite database locally if they are run on a single
>> machine - while with things like "edge executor" the tasks could run
>> elsewhere (or also on the same machine - with Local Executor).
>> 
>> One thing that it enables - we could simply remove SequentialExecutor. IMHO
>> the only reason why it continued to exist was the case with SQLIte (and
>> even there for quite some time sqlite could work with LocalExecutor with
>> parallelism = 1). There is also a "debuggability" thing - possibly - but
>> with `airflow dag test` - I think Sequential Executor has no longer an
>> advantage there. And we could make LocalExecutor with n = num available
>> processors (maybe - 2 or -3) as default airflow setting - which would
>> mitigate some of the "first-time" experience of people who see that Airflow
>> is "slow" (with sequential executor it is). And we could get rid of the
>> pesky "Do not use sequential executor in production" warning and simplify
>> the Executor interface (now executor has a special `is_production`
>> flag/mode).
>> 
>> But there is more.
>> 
>> If we add to it "airflow standalone" and some ways (even just instructions
>> or guidelines) for the users how to back-up, possibly compact and maintain
>> sqlite database, I don't think we are far away from announcing the Sqlite
>> DB as "low production ready". SQLite is a "real" database, for many years
>> it's used in production in many, many products and I would say - we have
>> far less problems with sqlite than we have with MySQL - in our CI for
>> example. And if we combine it with "airflow standalone" -  I think we
>> **could** say "If you want to run Airflow on one machine, without bit
>> expectations about scalability - Airflow 3 + Sqlite is a **GOOD**
>> production choice"
>> 
>> Likely we would have to test it a bit more, and do some documentation
>> around, but I think that could alleviate a lot of concerns and address a
>> bit of a "drawback" people have around Airflow that it is "difficult to
>> start with". Currently when you try airflow - you have all the warnings
>> "don't use this setup - it's only suitable to play with airflow" - but I
>> think we are not too far to say this:
>> 
>> 
>> 1) run pip install airflow[google,amazon,cohere]==3.0.0
>> 2) run "airflow standalone" in whatever way you think is best to manage
>> restarts
>> 3) -> that's it. you have very low-scale, production-ready airflow up and
>> running
>> 
>> Especially if we document and figure out some of the limitations, when
>> people should consider switching to more "higher production" settings with
>> MySQL, Postgres and maybe give them tools to do so - that could also be a
>> very nice come-back to the original success story of Airflow  - where data
>> engineers were really installing airflow on their own to make their life
>> easier, and after some time their companies had to adopt them and install
>> Airflow or migrate to managed version at scale - kind of driving Airflow
>> adoption from the "bottom".
>> 
>> I think the investment to make "standalone airflow with sqlite3"
>> low-production-ready is relatively small, but being able to openly say -
>> "it's actually SUPER EASY to run airflow for small setup" - is a very
>> powerful selling point of Airflow 3 potentially.
>> 
>> But - of course - maybe there are some limitations of Sqlite that I am not
>> aware of. Ash mentioned in his PR: "Will this be without problems? No, not
>> entirely,"  - and yeah, likely it has some limitations and constraints, but
>> maybe they are not as big, and maybe we **could** commit as a community to
>> support Sqlite3 as "good" to use for really small installations.
>> 
>> WDYT?
>> 
>> J.
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to