GitHub user Dev-iL added a comment to the discussion: Cannot schedule a DAG on a DatasetAlias when using a clean Airflow docker image (for CI)
Thank you for the detailed response! I'm sure this will be useful to many future readers. This was indeed partly an issue in how our test suite was set up: in our top-level conftest we have code that detects the repo root, then sets `AIRFLOW_HOME` to `<repo_root>/airflow` (since our DAGs are located in a subfolder of that). We have no airflow database checked-in to the repository on that path (developers have a local db there, that's git-ignored). As it turns out, a database was being created on each run of the pipeline (which is not a bad thing in itself). Our CI worked like that with no issue until one day a DAG with a `DatasetAlias` schedule was added - and all hell broke loose. The workaround I ended up using was to add symlinks in `<repo_root>/airflow` that point to `opt/airflow` before running the test suite that uses another `AIRFLOW_HOME`. In hindsight, while my initial analysis of the problem was incorrect (because of not fully understanding how our test suite was affecting the CI), I still believe we encountered a problematic edge case of airflow DB creation, which seems to be related to something you too have noticed. GitHub link: https://github.com/apache/airflow/discussions/45236#discussioncomment-11685446 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
