[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

George Leslie-Waksman (JIRA) Wed, 08 Aug 2018 00:37:07 -0700


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572816#comment-16572816
 ]


George Leslie-Waksman commented on AIRFLOW-2870:
------------------------------------------------

The process to reproduce is as follows:
 # Start with an Airflow deployment that predates 
{{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} (e.g. 1.8.1)
 # Run Airflow enough to populate task_instances in the metadata database (run 
one of the sample dags)
 # Install an Airflow version after 
{{27c6a30d7c24_add_executor_config_to_task_instance.py}} (e.g. 1.10rc3)
 # {{airflow upgradedb}}

This will fail with a message about the column "task_instance.executor_config" 
not existing.

My current understanding of what is happening:
 * When constructing a sqlalchemy orm query using a declarative model (i.e. 
{{TaskInstance}}), the database table must be consistent with the structure of 
that model.
 ** SQLAlchemy's mapper will query all columns known to the orm mapper (code 
side) and assume they exist in the database
 * When running a migration, the database table is in a transitionary state
 * The code in {{airflow/models.py}} reflects the state of the database after 
running ALL migrations through the present
* When we are using the 1.10rc3 code to run migrations and we reach 
{{cc1e65623dc7_add_max_tries_column_to_task_instance.py}}, we [import 
TaskInstance|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L36]
 as if it has all future columns and then [query the old 
schema|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L64]

Under typical circumstances, one can avoid this issue by performing migrations 
using alembic + SQLAlchemy core (no orm) and directly manipulating the tables. 
However, in this case, we need to populate information from a {{Task}} object 
that does not have a representation in the database.

We may be able to work around the database issues by manipulating SQLAlchemy's 
[column 
loading|http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols]
 but that may be tricky given the intertwined nature of Airflow's model code.

> Migrations fail when upgrading from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance
> --------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2870
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: George Leslie-Waksman
>            Priority: Blocker
>
> Running migrations from below 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> 
> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1182, in _execute_context
>     context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 470, in do_execute
>     cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance 
> from the current code version, which has changes to the task_instance table 
> that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an 
> executor_config column that does not exist as of when 
> cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because 
> the migration branches on table existence/non-existence at a point that will 
> hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

Reply via email to