potiuk commented on issue #44775:
URL: https://github.com/apache/airflow/issues/44775#issuecomment-2563489445

   While I cannot help you to answer your question, I might try to guide you 
with looking at the reasoning.
   
   I think you need to describe what you are exactly doing and how you are 
initializing the databse of Airflow and which tests you are talking about. If 
your tests are accessing the DB, you have to make sure in your test setup that 
the database is created. This is what various fixtures are doing usually.
   
   Airflow DB tests in Airflow CI do this by auto-use fixture that creates and 
initializes the DB: 
https://github.com/apache/airflow/blob/main/tests_common/pytest_plugin.py#L317  
when it has not been initialized (which is generally the first time it runs in 
a clear environment - say in a new docker container). And it creates a file  
".airflow_db_initialised" in HOME DIR of airlow when it does so, so it does not 
attempt to do it again. This file does not survive container restart usually so 
the intialization happens every time breeze container is started for example. 
This behaviour can be overwritten with `--with-db-init` flag that is added by 
our pytest plugin - when this flag is passed, database initialization happens 
at the beginning of pytest session.
   
   But this is how "airflow" test suite works - we have no idea what test suite 
you are talking about and how you run it, and what kind of assertions your 
containers have (which files are preserved between runs - for example being 
mounted, and which are not). This is all the question of how your CI and test 
suite is organized.
   
   Generally speaking - you have to make sure that your fixtures (if you use 
pytest) are doing the right thing and setting up the datebase for you. One of 
the difficulties you might have is that this also might depend on import 
sequence of things. Unfortunately airflow import does a lot of implicit things, 
some lazy loading of various components - because we are sort of trying to 
initialize everything when we import airflow, but we also try to avoid that 
initialization and do some magic with lazy loading to sometimes not to complete 
that intitialization to speed up things in some cases. This is a bit of duality 
we have - because we do "import airflow" pretty much with every possible 
command, but some of the commands, tests cases or direct imports should be way 
faster than completely importing and initializing everything that airflow needs 
to import (configuraiton, settings, database, plugins, providers and so on).
   
   I hope we will do it differently in Airlfow 3 - and do much more explicit 
initialization of whatever we need, rather  than do that half-initialization 
and lazy-loading dance (which also causes occassional resursive import errors 
when modules are importing each other while not being fully imported - 
depending on sequence of imports). But that's something that we will likely 
discuss in coming weeks when we will be discussing Airflow 3 packaging and 
initialization.
   
   The consequence of that implicit loading is that we also attempt to make 
sure that all the models are imported before the database is reset. Most of 
this happens here:
   
   https://github.com/apache/airflow/blob/main/airflow/models/__init__.py
   
   And there are basically three ways how models are imported: 
   
   * TYPE_CHECKING - loading models for mypy type hint verification
   * lazy-loading some old models when they are accessed using "from 
airlfow.models import Model"  rather than importing models from sub-packages 
(this is done lazily because otherwise it will cause recursive imports and 
unnecessary loading the models
   * `import_all_models` method - that is supposed to make sure all models are 
loaded - for example when we run "init db" command - because sqlalchemy will 
only create the tables, when corresponding ORM models are imported and 
registered in SQL Alchemy engine. Generally you should make sure you run this 
method before you run "initdb" with sqlite when you create a new database. 
Maybe your test fixture does not do it.
   
   So in general it's not too explicit and there are few places/paths where 
creation of database of Airflow might go wrong. If you have any path where you 
are runnning db init but do not import/load the models in mamory - you might 
simply create an empty database, or maybe you have similar "sentinel file" or 
some kind of guard that prevents the database from being initialized when 
needed and this file is not removed in your CI ? But I am only guessing.
   
   My guess is that your tests do not do proper initialization, or maybe 
delete/remove the tables created at some point in time or maybe some 
combination of these.
   
   BTW. I am converting this into a discussion. This is not really "issue" in 
Airlfow - it's more troubleshooting of what your test suite does or does not.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to