GitHub user Dev-iL added a comment to the discussion: Cannot schedule a DAG on 
a DatasetAlias when using a clean Airflow docker image (for CI)

Thank you for the detailed response! I'm sure this will be useful to many 
future readers.

This was indeed partly an issue in how our test suite was set up: in our 
top-level conftest we have code that detects the repo root, then sets 
`AIRFLOW_HOME` to `<repo_root>/airflow` (since our DAGs are located in a 
subfolder of that). We have no airflow database checked-in to the repository on 
that path (developers have a local db there, that's git-ignored). As it turns 
out, a database was being created on each run of the pipeline (which is not a 
bad thing in itself). Our CI worked like that with no issue until one day a DAG 
with a `DatasetAlias` schedule was added - and all hell broke loose.

The workaround I ended up using was to add symlinks in `<repo_root>/airflow` 
that point to `opt/airflow` before running the test suite that uses another 
`AIRFLOW_HOME`.

In hindsight, while my initial analysis of the problem was incorrect (because 
of not fully understanding how our test suite was affecting the CI), I still 
believe we encountered a problematic edge case of airflow DB creation, which 
seems to be related to something you too have noticed.

GitHub link: 
https://github.com/apache/airflow/discussions/45236#discussioncomment-11685446

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to