[ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Shugerman updated AIRFLOW-3973:
---------------------------------------
    Description: 
h2. Notes:
 * This does not occur if the database is already initialized. If it is, run 
`resetdb` instead to observe the bug.
 * This does not occur with the default SQLite database.

h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

 
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.

  was:
h2. Example

{{ERROR [airflow.models.DagBag] Failed to import: 
/home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1236, in _execute_context cursor, statement, parameters, context File 
"/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 536, in do_execute cursor.execute(statement, parameters) 
psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
variable}}
h2. Explanation

The first thing {{airflow initdb}} does is run the Alembic migrations. All 
migrations are run in one transaction. Most tables, including the {{variable}} 
table, are defined in the initial migration. A [later 
migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
 imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
calls its {{collect_dags}} method, which scans the DAGs directory and attempts 
to load all DAGs it finds. When it loads a DAG that uses a {{Variable}}, it 
will query the database to see if that {{Variable}} is defined in the 
{{variable}} table. It's not clear to me how exactly the connection for that 
query is created, but I think it is a fair assumption that it does _not_ use 
the same transaction that is used to run the migrations. Since the migrations 
are not yet complete, and all migrations are run in one transaction, the 
migration that creates the {{variable}} table has not yet been committed, and 
therefore the table does not exist to any other connection/transaction. This 
raises {{ProgrammingError}}, which is caught and logged by {{collect_dags}}.

NOTE: This does not occur with the default SQLite database.
h2. Proposed Solution

Run each Alembic migration in its own transaction. I will open a pull request 
which accomplishes this shortly.


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3973
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Elliott Shugerman
>            Assignee: Elliott Shugerman
>            Priority: Minor
>
> h2. Notes:
>  * This does not occur if the database is already initialized. If it is, run 
> `resetdb` instead to observe the bug.
>  * This does not occur with the default SQLite database.
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import: 
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): 
> File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1236, in _execute_context cursor, statement, parameters, context File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 536, in do_execute cursor.execute(statement, parameters) 
> psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
> variable}}
> h2. Explanation
> The first thing {{airflow initdb}} does is run the Alembic migrations. All 
> migrations are run in one transaction. Most tables, including the 
> {{variable}} table, are defined in the initial migration. A [later 
> migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
>  imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
> calls its {{collect_dags}} method, which scans the DAGs directory and 
> attempts to load all DAGs it finds. When it loads a DAG that uses a 
> {{Variable}}, it will query the database to see if that {{Variable}} is 
> defined in the {{variable}} table. It's not clear to me how exactly the 
> connection for that query is created, but I think it is a fair assumption 
> that it does _not_ use the same transaction that is used to run the 
> migrations. Since the migrations are not yet complete, and all migrations are 
> run in one transaction, the migration that creates the {{variable}} table has 
> not yet been committed, and therefore the table does not exist to any other 
> connection/transaction. This raises {{ProgrammingError}}, which is caught and 
> logged by {{collect_dags}}.
>  
> h2. Proposed Solution
> Run each Alembic migration in its own transaction. I will open a pull request 
> which accomplishes this shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to