In my opinion this isn't a good thing - as you've observed it's making a database connection each time. As the scheduler runs every 5 (?) seconds that means the database is being queried frequently. As the number of DAGs grow you will see the number of connections growing as well. We had a similar situation with our Variable.get() calls - we had some in the default_args and some as top level variables, so they kept getting re-executed as the .py file was being scanned. In just 20-30 DAGs we ran over the maximum connection limit in RDS (t2.small), so we moved all Variable.get() calls to as late as possible (inside methods/jinja templates).
In the example you've shown it's worth converting it to environment variables if you need some flexibility, the change management then shifts to your deployment/configuration mechanism. Or just hardcode it and redeploy when needed. On Mon, Oct 22, 2018 at 8:34 AM Pramiti Goel <pramitigoe...@gmail.com> wrote: > Hi, > > We want to make owner and email Id general, so we don't want to put in > airflow dag. Using variables will help us in changing the email/owner > later, if there are lot of dags of same owner. > > For example: > > > default_args = { > 'owner': Variable.get('test_owner_de'), > 'depends_on_past': False, > 'start_date': datetime(2018, 10, 17), > 'email': Variable.get('de_infra_email'), > 'email_on_failure': True, > 'email_on_retry': True, > 'retries': 2, > 'retry_delay': timedelta(minutes=1)} > > > Looking into the code of Airflow, it is making connection session everytime > the variable is created, and then close it. (Let me know if I understand > wrong). If there are many dags with variables in default args running > parallel, querying variable table in MySQL, will it have any sort of > limitation on number of sessions of SQLAlchemy ? Will that make dag slow as > there will be many queries to mysql for each dag? is the above approach > good ? > > >using Airlfow 1.9 > > Thanks, > Pramiti. >