DagBag import timeouts happen when people do more than just "configuration as code" in their module scope (say doing actual compute in module scope, which is a no-no). They may also happen if you read things from flimsy external systems that may introduce delays. Say you read pipeline configuration from Zookeeper or from a database or network drive and somehow that operation is timing out.
Also with Airflow (at the moment) you are responsible to synchronize the pipeline definitions (DAGS_FOLDER) on all machines across the cluster. If they are not in sync you'll have problems with symptoms that may look like "dag_id not found". That happens when the scheduler is aware of DAGs that workers may not be aware of. Max On Mon, Jun 11, 2018 at 12:42 PM Stephane Bonneaud <steph...@fathomhealth.co> wrote: > Hi there, > > We’re using Airflow in our startup and it’s been great in many ways, > thanks for the work you guys are doing! > > Unfortunately, we’re hitting a bunch of issues with ops timing out, DAGs > failing for unclear reasons, with no logs or the following error: > "airflow.exceptions.AirflowException: dag_id could not be found”. This > seems to happen when enough DAGs are running at the same time, though it > can also happen more rarely here and there. But, the best way to reproduce > the error with our setup is to run enough DAGs at once. Most of the time, > clearing the DAG run or ops that have failed and letting the DAG re-run is > enough to fix the problem. > > I found resources pointing to the dagbag_import_timeout, e.g., > https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found > < > https://stackoverflow.com/questions/43235130/airflow-dag-id-could-not-be-found > >. > I did play with that parameter, and other parameters as well. And it does > seem that they help, i.e., I can run more DAGs at once, but > (1) if I run enough DAGs at once, I still see ops and DAGs > failing, so the problem is not fixed ; > (2) more importantly, I don’t fully understand the problem. I have > some ideas on what is happening, but maybe I’m totally wrong? > > Any recommendations on how I should investigate that? > > Thank you very much! > Have a nice rest of the day, > Stéphane > http://stephanebonneaud.com <http://stephanebonneaud.com/> > >