In my opinion this isn't a good thing - as you've observed it's making a
database connection each time. As the scheduler runs every 5 (?) seconds
that means the database is being queried frequently.  As the number of DAGs
grow you will see the number of connections growing as well.  We had a
similar situation with our Variable.get() calls - we had some in the
default_args and some as top level variables, so they kept getting
re-executed as the .py file was being scanned.  In just 20-30 DAGs we ran
over the maximum connection limit in RDS (t2.small), so we moved all
Variable.get() calls to as late as possible (inside methods/jinja
templates).

In the example you've shown it's worth converting it to environment
variables if you need some flexibility, the change management then shifts
to your deployment/configuration mechanism.  Or just hardcode it and
redeploy when needed.





On Mon, Oct 22, 2018 at 8:34 AM Pramiti Goel <pramitigoe...@gmail.com>
wrote:

> Hi,
>
> We want to make owner and email Id general, so we don't want to put in
> airflow dag. Using variables will help us in changing the email/owner
> later, if there are lot of dags of same owner.
>
> For example:
>
>
> default_args = {
>     'owner': Variable.get('test_owner_de'),
>     'depends_on_past': False,
>     'start_date': datetime(2018, 10, 17),
>     'email': Variable.get('de_infra_email'),
>     'email_on_failure': True,
>     'email_on_retry': True,
>     'retries': 2,
>     'retry_delay': timedelta(minutes=1)}
>
>
> Looking into the code of Airflow, it is making connection session everytime
> the variable is created, and then close it. (Let me know if I understand
> wrong). If there are many dags with variables in default args running
> parallel, querying variable table in MySQL, will it have any sort of
> limitation on number of sessions of SQLAlchemy ? Will that make dag slow as
> there will be many queries to mysql for each dag? is the above approach
> good ?
>
>  >using Airlfow 1.9
>
> Thanks,
> Pramiti.
>

Reply via email to