[ https://issues.apache.org/jira/browse/AIRFLOW-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278062#comment-15278062 ]
ASF subversion and git services commented on AIRFLOW-92: -------------------------------------------------------- Commit dddfd3b5bf2cabaac6eec123dfa3cb59e73a56f5 in incubator-airflow's branch refs/heads/master from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=dddfd3b ] AIRFLOW-92 Avoid unneeded upstream_failed session closes apache/incubator-airflow#1485 > Tasks not being retried at all due to a 'obj not bound to a Session' exception > ------------------------------------------------------------------------------ > > Key: AIRFLOW-92 > URL: https://issues.apache.org/jira/browse/AIRFLOW-92 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: Airflow 1.7.0 > Environment: EC2 t2.medium instance, > Docker `version 1.11.1, build 5604cbe`, > Host is `Linux ip-172-31-44-140 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 > 20:50:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux`, > Docker containers are built upon the `python:3.5` image, > LocalExecutor is used with two scheduler containers running > Reporter: Bence Nagy > Priority: Critical > > I have some tasks that are stuck in {{up_for_retry}} state, below is an > extract from the database. (here it is in a [Google Drive > spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing] > with better formatting) > {code} > task_id dag_id execution_date start_date end_date > duration state try_number hostname unixname > job_id pool queue priority_weight operator queued_dttm id > dag_id state job_type start_date end_date > latest_heartbeat executor_class hostname unixname id > dag_id execution_date state run_id external_trigger conf > end_date start_date > task_a dag_a1 2016-05-09 08:00:00.000000 2016-05-09 > 12:00:12.382775 2016-05-09 12:01:12.473914 60.091139 > up_for_retry 1 d5593c115c22 root 46266 default 4 > ExternalTaskSensor 46266 success LocalTaskJob > 2016-05-09 12:00:08.195711 2016-05-09 12:01:13.261937 2016-05-09 > 12:00:08.195732 LocalExecutor d5593c115c22 root 17799 dag_a1 > 2016-05-09 08:00:00.000000 failed scheduled__2016-05-09T08:00:00 false > 2016-05-09 12:00:04.406875 > task_a dag_a2 2016-05-09 10:00:00.000000 2016-05-09 > 12:00:13.102094 2016-05-09 12:01:13.185960 60.083866 > up_for_retry 1 d5593c115c22 root 46270 default 4 > ExternalTaskSensor 46270 success LocalTaskJob > 2016-05-09 12:00:08.896527 2016-05-09 12:01:13.960936 2016-05-09 > 12:00:08.896550 LocalExecutor d5593c115c22 root 17800 dag_a2 > 2016-05-09 10:00:00.000000 failed scheduled__2016-05-09T10:00:00 false > 2016-05-09 12:00:04.531888 > task_b dag_b 2016-04-07 18:00:00.000000 2016-05-09 > 12:53:59.990395 2016-05-09 12:54:00.393259 0.402864 > up_for_retry 1 0a8613c2b5d2 root 46366 default 1 > PostgresOperator 46366 success LocalTaskJob > 2016-05-09 12:53:58.881987 2016-05-09 12:54:03.891450 2016-05-09 > 12:53:58.882006 LocalExecutor 0a8613c2b5d2 root 17836 dag_b > 2016-04-07 18:00:00.000000 running scheduled__2016-04-07T18:00:00 false > 2016-05-09 12:51:59.713718 > task_c dag_b 2016-04-07 16:00:00.000000 2016-05-09 > 12:53:49.822634 2016-05-09 12:54:49.924291 60.101657 > up_for_retry 1 0a8613c2b5d2 root 46359 default 2 > ExternalTaskSensor 46359 success LocalTaskJob > 2016-05-09 12:53:44.739355 2016-05-09 12:54:54.810579 2016-05-09 > 12:53:44.739575 LocalExecutor 0a8613c2b5d2 root 17831 dag_b > 2016-04-07 16:00:00.000000 running scheduled__2016-04-07T16:00:00 false > 2016-05-09 12:51:55.078050 > {code} > I'm getting the following exception which seems to be halting the scheduler > just before it could queue the tasks for retrying: > {code} > [2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at > 0x7f48a6b87550> is not bound to a Session; attribute refresh operation cannot > proceed > Traceback (most recent call last): > File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in > _do_dags > self.process_dag(dag, tis_out) > File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in > process_dag > active_runs = dag.get_active_runs() > File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731, > in get_active_runs > active_dates.append(run.execution_date) > File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", > line 237, in __get__ > return self.impl.get(instance_state(instance), dict_) > File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", > line 578, in get > value = state._load_expired(state, passive) > File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line > 474, in _load_expired > self.manager.deferred_scalar_loader(self, toload) > File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", > line 610, in load_scalar_attributes > (state_str(state))) > sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550> > is not bound to a Session; attribute refresh operation cannot proceed > {code} > I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls > which have a commit at the end; after doing this there's no exceptions and > the tasks are getting retried correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)