[ 
https://issues.apache.org/jira/browse/AIRFLOW-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140717#comment-17140717
 ] 

ASF subversion and git services commented on AIRFLOW-3973:
----------------------------------------------------------

Commit 5bc50183e0934f7368d9cd991074b2b581114395 in airflow's branch 
refs/heads/v1-10-test from Elliott Shugerman
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=5bc5018 ]

[AIRFLOW-3973] Commit after each alembic migration (#4797)

If `Variable`s are used in DAGs, and Postgres is used for the internal
database, a fresh `$ airflow initdb` (or `$ airflow resetdb`) spams the
logs with error messages (but does not fail).

This commit corrects this by running each migration in a separate
transaction.

Co-authored-by: Elliott Shugerman <eeshuger...@medianewsgroup.com>

(cherry picked from commit ea95e9c7236969acc807c65de0f12633d04753a0)
(cherry picked from commit 5b48a5394ecf5aa1f2b50a00807e6149ade21968)


> `airflow initdb` logs errors when `Variable` is used in DAGs and Postgres is 
> used for the internal database
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3973
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3973
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Elliott Shugerman
>            Assignee: Elliott Shugerman
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> h2. Notes:
>  * This does not occur if the database is already initialized. If it is, run 
> `resetdb` instead to observe the bug.
>  * This does not occur with the default SQLite database.
> h2. Example
> {{ERROR [airflow.models.DagBag] Failed to import: 
> /home/elliott/clean-airflow/dags/dag.py Traceback (most recent call last): 
> File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
>  line 1236, in _execute_context cursor, statement, parameters, context File 
> "/home/elliott/.virtualenvs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
>  line 536, in do_execute cursor.execute(statement, parameters) 
> psycopg2.ProgrammingError: relation "variable" does not exist LINE 2: FROM 
> variable}}
> h2. Explanation
> The first thing {{airflow initdb}} does is run the Alembic migrations. All 
> migrations are run in one transaction. Most tables, including the 
> {{variable}} table, are defined in the initial migration. A [later 
> migration|https://github.com/apache/airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py]
>  imports and initializes {{models.DagBag}}. Upon initialization, {{DagBag}} 
> calls its {{collect_dags}} method, which scans the DAGs directory and 
> attempts to load all DAGs it finds. When it loads a DAG that uses a 
> {{Variable}}, it will query the database to see if that {{Variable}} is 
> defined in the {{variable}} table. It's not clear to me how exactly the 
> connection for that query is created, but I think it is apparent that it does 
> _not_ use the same transaction that is used to run the migrations. Since the 
> migrations are not yet complete, and all migrations are run in one 
> transaction, the migration that creates the {{variable}} table has not yet 
> been committed, and therefore the table does not exist to any other 
> connection/transaction. This raises {{ProgrammingError}}, which is caught and 
> logged by {{collect_dags}}.
>  
> h2. Proposed Solution
> Run each Alembic migration in its own transaction. I will open a pull request 
> which accomplishes this shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to