Rolf Schroeder commented on AIRFLOW-1642:


the following workaround allows to use the Joy's patch on a one time basis:

# Got to Airflow install dir
cd /path/to/venv/lib/python*/site-packages/airflow/migrations/versions
# Make a backup of the "faulty" revision
rsync -a cc1e65623dc7_add_max_tries_column_to_task_instance.py 
# Add patch
sed -i 's/session = sessionmaker(bind=connection)/session = 
settings.Session()/' cc1e65623dc7_add_max_tries_column_to_task_instance.py
# Init db
airflow initdb
# Restore the revision
rsync -av cc1e65623dc7_add_max_tries_column_to_task_instance.py.bak 

This is obviously not how things should get fixed but a working solution until 
someone is bold enough to actually fix the migration ;)

> An Alembic script not using scoped session causing deadlock
> -----------------------------------------------------------
>                 Key: AIRFLOW-1642
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1642
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Joy Gao
>            Priority: Minor
> The bug I'm about to describe is a more of an obscure edge case, however I 
> think it's something still worth fixing.
> After upgrading to airflow 1.9, while running `airflow resetdb` on my local 
> machine (with mysql), I encountered a deadlock on the final alembic revision 
> _d2ae31099d61 Increase text size for MySQL (not relevant for other DBs' text 
> types)_.
> The deadlock turned out to be caused by another earlier session that was 
> created and left open in revision _cc1e65623dc7 add max tries column to task 
> instance_. Notably the code below:
> {code}
> sessionmaker = sa.orm.sessionmaker()
> session = sessionmaker(bind=connection)
> dagbag = DagBag(settings.DAGS_FOLDER)
> {code}
> The session created here was not a `scoped_session`, so when the DAGs were 
> being parsed in line 3 above, one of the DAG files makes a direct call to the 
> class method `Variable.get()` to acquire an env variable, which makes a db 
> query to the `variable` table, but raised a KeyError as the env variable was 
> non-existent, thus holding the lock to the `variable` table as a result of 
> that exception.
> Later on, the latter alembic script `_cc1e65623dc7` needs to alter the 
> `Variable` table. Instead of creating its own Session object, it attempts to 
> reuse the same one as above. And because of the exception, it waits 
> indefinitely to acquire the lock on that table. 
> So the DAG file itself could have avoided the KeyError by providing a default 
> value when calling Variable.get(). However I think it would be a good idea to 
> avoid using unscoped sessions in general, as an exception could potentially 
> occur in the future elsewhere.  The easiest fix is replacing *session = 
> sessionmaker(bind=connection)* with *session = settings.Session()*, which is 
> scoped. However, making a change on a migration script is going to make folks 
> anxious.
> If anyone have any thoughts on this, let me know! Thanks :)

This message was sent by Atlassian JIRA

Reply via email to