Hello here,

TL;DR; Recently Ash created and merged this PR
https://github.com/apache/airflow/pull/44839
"Remove 'single process' restrictions on SQLite in favour of using WAL
mode" and I think it opens up an interesting possibility - to make SQLite a
"low production ready" database.

With this change, some of the limitations of SQLite integration for Airflow
have been removed (multi-process access). With Airflow 3 and moving DB
access out from Tasks, we are getting into the situation that all the DB
access will be concentrated in the "central" place - webserver, scheduler,
triggerer, dag processor , task api - and with WAL, it seems that all those
**could** access sqlite database locally if they are run on a single
machine - while with things like "edge executor" the tasks could run
elsewhere (or also on the same machine - with Local Executor).

One thing that it enables - we could simply remove SequentialExecutor. IMHO
the only reason why it continued to exist was the case with SQLIte (and
even there for quite some time sqlite could work with LocalExecutor with
parallelism = 1). There is also a "debuggability" thing - possibly - but
with `airflow dag test` - I think Sequential Executor has no longer an
advantage there. And we could make LocalExecutor with n = num available
processors (maybe - 2 or -3) as default airflow setting - which would
mitigate some of the "first-time" experience of people who see that Airflow
is "slow" (with sequential executor it is). And we could get rid of the
pesky "Do not use sequential executor in production" warning and simplify
the Executor interface (now executor has a special `is_production`
flag/mode).

But there is more.

If we add to it "airflow standalone" and some ways (even just instructions
or guidelines) for the users how to back-up, possibly compact and maintain
sqlite database, I don't think we are far away from announcing the Sqlite
DB as "low production ready". SQLite is a "real" database, for many years
it's used in production in many, many products and I would say - we have
far less problems with sqlite than we have with MySQL - in our CI for
example. And if we combine it with "airflow standalone" -  I think we
**could** say "If you want to run Airflow on one machine, without bit
expectations about scalability - Airflow 3 + Sqlite is a **GOOD**
production choice"

Likely we would have to test it a bit more, and do some documentation
around, but I think that could alleviate a lot of concerns and address a
bit of a "drawback" people have around Airflow that it is "difficult to
start with". Currently when you try airflow - you have all the warnings
"don't use this setup - it's only suitable to play with airflow" - but I
think we are not too far to say this:


1) run pip install airflow[google,amazon,cohere]==3.0.0
2) run "airflow standalone" in whatever way you think is best to manage
restarts
3) -> that's it. you have very low-scale, production-ready airflow up and
running

Especially if we document and figure out some of the limitations, when
people should consider switching to more "higher production" settings with
MySQL, Postgres and maybe give them tools to do so - that could also be a
very nice come-back to the original success story of Airflow  - where data
engineers were really installing airflow on their own to make their life
easier, and after some time their companies had to adopt them and install
Airflow or migrate to managed version at scale - kind of driving Airflow
adoption from the "bottom".

I think the investment to make "standalone airflow with sqlite3"
low-production-ready is relatively small, but being able to openly say -
"it's actually SUPER EASY to run airflow for small setup" - is a very
powerful selling point of Airflow 3 potentially.

But - of course - maybe there are some limitations of Sqlite that I am not
aware of. Ash mentioned in his PR: "Will this be without problems? No, not
entirely,"  - and yeah, likely it has some limitations and constraints, but
maybe they are not as big, and maybe we **could** commit as a community to
support Sqlite3 as "good" to use for really small installations.

WDYT?

J.

Reply via email to