potiuk commented on issue #44775: URL: https://github.com/apache/airflow/issues/44775#issuecomment-2563489445
While I cannot help you to answer your question, I might try to guide you with looking at the reasoning. I think you need to describe what you are exactly doing and how you are initializing the databse of Airflow and which tests you are talking about. If your tests are accessing the DB, you have to make sure in your test setup that the database is created. This is what various fixtures are doing usually. Airflow DB tests in Airflow CI do this by auto-use fixture that creates and initializes the DB: https://github.com/apache/airflow/blob/main/tests_common/pytest_plugin.py#L317 when it has not been initialized (which is generally the first time it runs in a clear environment - say in a new docker container). And it creates a file ".airflow_db_initialised" in HOME DIR of airlow when it does so, so it does not attempt to do it again. This file does not survive container restart usually so the intialization happens every time breeze container is started for example. This behaviour can be overwritten with `--with-db-init` flag that is added by our pytest plugin - when this flag is passed, database initialization happens at the beginning of pytest session. But this is how "airflow" test suite works - we have no idea what test suite you are talking about and how you run it, and what kind of assertions your containers have (which files are preserved between runs - for example being mounted, and which are not). This is all the question of how your CI and test suite is organized. Generally speaking - you have to make sure that your fixtures (if you use pytest) are doing the right thing and setting up the datebase for you. One of the difficulties you might have is that this also might depend on import sequence of things. Unfortunately airflow import does a lot of implicit things, some lazy loading of various components - because we are sort of trying to initialize everything when we import airflow, but we also try to avoid that initialization and do some magic with lazy loading to sometimes not to complete that intitialization to speed up things in some cases. This is a bit of duality we have - because we do "import airflow" pretty much with every possible command, but some of the commands, tests cases or direct imports should be way faster than completely importing and initializing everything that airflow needs to import (configuraiton, settings, database, plugins, providers and so on). I hope we will do it differently in Airlfow 3 - and do much more explicit initialization of whatever we need, rather than do that half-initialization and lazy-loading dance (which also causes occassional resursive import errors when modules are importing each other while not being fully imported - depending on sequence of imports). But that's something that we will likely discuss in coming weeks when we will be discussing Airflow 3 packaging and initialization. The consequence of that implicit loading is that we also attempt to make sure that all the models are imported before the database is reset. Most of this happens here: https://github.com/apache/airflow/blob/main/airflow/models/__init__.py And there are basically three ways how models are imported: * TYPE_CHECKING - loading models for mypy type hint verification * lazy-loading some old models when they are accessed using "from airlfow.models import Model" rather than importing models from sub-packages (this is done lazily because otherwise it will cause recursive imports and unnecessary loading the models * `import_all_models` method - that is supposed to make sure all models are loaded - for example when we run "init db" command - because sqlalchemy will only create the tables, when corresponding ORM models are imported and registered in SQL Alchemy engine. Generally you should make sure you run this method before you run "initdb" with sqlite when you create a new database. Maybe your test fixture does not do it. So in general it's not too explicit and there are few places/paths where creation of database of Airflow might go wrong. If you have any path where you are runnning db init but do not import/load the models in mamory - you might simply create an empty database, or maybe you have similar "sentinel file" or some kind of guard that prevents the database from being initialized when needed and this file is not removed in your CI ? But I am only guessing. My guess is that your tests do not do proper initialization, or maybe delete/remove the tables created at some point in time or maybe some combination of these. BTW. I am converting this into a discussion. This is not really "issue" in Airlfow - it's more troubleshooting of what your test suite does or does not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
