[
https://issues.apache.org/jira/browse/AIRFLOW-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bence Nagy updated AIRFLOW-92:
------------------------------
Description:
I have some tasks that are stuck in {{up_for_retry}} state, below is an extract
from the database. (here it is in a [Google Drive
spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing]
with better formatting)
{code}
task_id dag_id execution_date start_date end_date duration
state try_number hostname unixname job_id pool queue
priority_weight operator queued_dttm id dag_id state
job_type start_date end_date latest_heartbeat
executor_class hostname unixname id dag_id execution_date
state run_id external_trigger conf end_date start_date
task_a dag_a1 2016-05-09 08:00:00.000000 2016-05-09 12:00:12.382775
2016-05-09 12:01:12.473914 60.091139 up_for_retry 1
d5593c115c22 root 46266 default 4 ExternalTaskSensor
46266 success LocalTaskJob 2016-05-09 12:00:08.195711
2016-05-09 12:01:13.261937 2016-05-09 12:00:08.195732 LocalExecutor
d5593c115c22 root 17799 dag_a1 2016-05-09 08:00:00.000000 failed
scheduled__2016-05-09T08:00:00 false 2016-05-09
12:00:04.406875
task_a dag_a2 2016-05-09 10:00:00.000000 2016-05-09 12:00:13.102094
2016-05-09 12:01:13.185960 60.083866 up_for_retry 1
d5593c115c22 root 46270 default 4 ExternalTaskSensor
46270 success LocalTaskJob 2016-05-09 12:00:08.896527
2016-05-09 12:01:13.960936 2016-05-09 12:00:08.896550 LocalExecutor
d5593c115c22 root 17800 dag_a2 2016-05-09 10:00:00.000000 failed
scheduled__2016-05-09T10:00:00 false 2016-05-09
12:00:04.531888
task_b dag_b 2016-04-07 18:00:00.000000 2016-05-09 12:53:59.990395
2016-05-09 12:54:00.393259 0.402864 up_for_retry 1
0a8613c2b5d2 root 46366 default 1 PostgresOperator
46366 success LocalTaskJob 2016-05-09 12:53:58.881987
2016-05-09 12:54:03.891450 2016-05-09 12:53:58.882006 LocalExecutor
0a8613c2b5d2 root 17836 dag_b 2016-04-07 18:00:00.000000 running
scheduled__2016-04-07T18:00:00 false 2016-05-09
12:51:59.713718
task_c dag_b 2016-04-07 16:00:00.000000 2016-05-09 12:53:49.822634
2016-05-09 12:54:49.924291 60.101657 up_for_retry 1
0a8613c2b5d2 root 46359 default 2 ExternalTaskSensor
46359 success LocalTaskJob 2016-05-09 12:53:44.739355
2016-05-09 12:54:54.810579 2016-05-09 12:53:44.739575 LocalExecutor
0a8613c2b5d2 root 17831 dag_b 2016-04-07 16:00:00.000000 running
scheduled__2016-04-07T16:00:00 false 2016-05-09
12:51:55.078050
{code}
I'm getting the following exception which seems to be halting the scheduler
just before it could queue the tasks for retrying:
{code}
[2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at
0x7f48a6b87550> is not bound to a Session; attribute refresh operation cannot
proceed
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in
_do_dags
self.process_dag(dag, tis_out)
File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in
process_dag
active_runs = dag.get_active_runs()
File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731,
in get_active_runs
active_dates.append(run.execution_date)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
line 237, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
line 578, in get
value = state._load_expired(state, passive)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line
474, in _load_expired
self.manager.deferred_scalar_loader(self, toload)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line
610, in load_scalar_attributes
(state_str(state)))
sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550>
is not bound to a Session; attribute refresh operation cannot proceed
{code}
I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls
which have a commit at the end; after doing this there's no exceptions and the
tasks are getting retried correctly.
was:
I have some tasks that are stuck in {{up_for_retry}} state, below is an extract
from the database. (here it is in a [Google Drive
spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing]
with better formatting)
{code}
task_id dag_id execution_date start_date end_date duration
state try_number hostname unixname job_id pool queue
priority_weight operator queued_dttm id dag_id state
job_type start_date end_date latest_heartbeat
executor_class hostname unixname id dag_id execution_date
state run_id external_trigger conf end_date start_date
task_a dag_a1 2016-05-09 08:00:00.000000 2016-05-09 12:00:12.382775
2016-05-09 12:01:12.473914 60.091139 up_for_retry 1
d5593c115c22 root 46266 default 4 ExternalTaskSensor
46266 success LocalTaskJob 2016-05-09 12:00:08.195711
2016-05-09 12:01:13.261937 2016-05-09 12:00:08.195732 LocalExecutor
d5593c115c22 root 17799 dag_a1 2016-05-09 08:00:00.000000 failed
scheduled__2016-05-09T08:00:00 false 2016-05-09
12:00:04.406875
task_a dag_a2 2016-05-09 10:00:00.000000 2016-05-09 12:00:13.102094
2016-05-09 12:01:13.185960 60.083866 up_for_retry 1
d5593c115c22 root 46270 default 4 ExternalTaskSensor
46270 success LocalTaskJob 2016-05-09 12:00:08.896527
2016-05-09 12:01:13.960936 2016-05-09 12:00:08.896550 LocalExecutor
d5593c115c22 root 17800 dag_a2 2016-05-09 10:00:00.000000 failed
scheduled__2016-05-09T10:00:00 false 2016-05-09
12:00:04.531888
task_b dag_b 2016-04-07 18:00:00.000000 2016-05-09 12:53:59.990395
2016-05-09 12:54:00.393259 0.402864 up_for_retry 1
0a8613c2b5d2 root 46366 default 1 PostgresOperator
46366 success LocalTaskJob 2016-05-09 12:53:58.881987
2016-05-09 12:54:03.891450 2016-05-09 12:53:58.882006 LocalExecutor
0a8613c2b5d2 root 17836 dag_b 2016-04-07 18:00:00.000000 running
scheduled__2016-04-07T18:00:00 false 2016-05-09
12:51:59.713718
task_c dag_b 2016-04-07 16:00:00.000000 2016-05-09 12:53:49.822634
2016-05-09 12:54:49.924291 60.101657 up_for_retry 1
0a8613c2b5d2 root 46359 default 2 ExternalTaskSensor
46359 success LocalTaskJob 2016-05-09 12:53:44.739355
2016-05-09 12:54:54.810579 2016-05-09 12:53:44.739575 LocalExecutor
0a8613c2b5d2 root 17831 dag_b 2016-04-07 16:00:00.000000 running
scheduled__2016-04-07T16:00:00 false 2016-05-09
12:51:55.078050
{code}
I'm getting the following exception which seems to be halting the scheduler
just before it could queue the tasks for retrying:
{code}
[2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at
0x7f48a6b87550> is not bound to a Session; attribute refresh operation cannot
proceed
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in
_do_dags
self.process_dag(dag, tis_out)
File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in
process_dag
active_runs = dag.get_active_runs()
File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731,
in get_active_runs
active_dates.append(run.execution_date)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
line 237, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
line 578, in get
value = state._load_expired(state, passive)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line
474, in _load_expired
self.manager.deferred_scalar_loader(self, toload)
File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line
610, in load_scalar_attributes
(state_str(state)))
sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550>
is not bound to a Session; attribute refresh operation cannot proceed
{code}
I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls
which has a commit at the end; after doing this there's no exceptions and the
tasks are getting retried correctly.
> Tasks not being retried at all due to a 'obj not bound to a Session' exception
> ------------------------------------------------------------------------------
>
> Key: AIRFLOW-92
> URL: https://issues.apache.org/jira/browse/AIRFLOW-92
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: Airflow 1.7.0
> Environment: EC2 t2.medium instance,
> Docker `version 1.11.1, build 5604cbe`,
> Host is `Linux ip-172-31-44-140 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17
> 20:50:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux`,
> Docker containers are built upon the `python:3.5` image,
> LocalExecutor is used with two scheduler containers running
> Reporter: Bence Nagy
> Priority: Critical
>
> I have some tasks that are stuck in {{up_for_retry}} state, below is an
> extract from the database. (here it is in a [Google Drive
> spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing]
> with better formatting)
> {code}
> task_id dag_id execution_date start_date end_date
> duration state try_number hostname unixname
> job_id pool queue priority_weight operator queued_dttm id
> dag_id state job_type start_date end_date
> latest_heartbeat executor_class hostname unixname id
> dag_id execution_date state run_id external_trigger conf
> end_date start_date
> task_a dag_a1 2016-05-09 08:00:00.000000 2016-05-09
> 12:00:12.382775 2016-05-09 12:01:12.473914 60.091139
> up_for_retry 1 d5593c115c22 root 46266 default 4
> ExternalTaskSensor 46266 success LocalTaskJob
> 2016-05-09 12:00:08.195711 2016-05-09 12:01:13.261937 2016-05-09
> 12:00:08.195732 LocalExecutor d5593c115c22 root 17799 dag_a1
> 2016-05-09 08:00:00.000000 failed scheduled__2016-05-09T08:00:00 false
> 2016-05-09 12:00:04.406875
> task_a dag_a2 2016-05-09 10:00:00.000000 2016-05-09
> 12:00:13.102094 2016-05-09 12:01:13.185960 60.083866
> up_for_retry 1 d5593c115c22 root 46270 default 4
> ExternalTaskSensor 46270 success LocalTaskJob
> 2016-05-09 12:00:08.896527 2016-05-09 12:01:13.960936 2016-05-09
> 12:00:08.896550 LocalExecutor d5593c115c22 root 17800 dag_a2
> 2016-05-09 10:00:00.000000 failed scheduled__2016-05-09T10:00:00 false
> 2016-05-09 12:00:04.531888
> task_b dag_b 2016-04-07 18:00:00.000000 2016-05-09
> 12:53:59.990395 2016-05-09 12:54:00.393259 0.402864
> up_for_retry 1 0a8613c2b5d2 root 46366 default 1
> PostgresOperator 46366 success LocalTaskJob
> 2016-05-09 12:53:58.881987 2016-05-09 12:54:03.891450 2016-05-09
> 12:53:58.882006 LocalExecutor 0a8613c2b5d2 root 17836 dag_b
> 2016-04-07 18:00:00.000000 running scheduled__2016-04-07T18:00:00 false
> 2016-05-09 12:51:59.713718
> task_c dag_b 2016-04-07 16:00:00.000000 2016-05-09
> 12:53:49.822634 2016-05-09 12:54:49.924291 60.101657
> up_for_retry 1 0a8613c2b5d2 root 46359 default 2
> ExternalTaskSensor 46359 success LocalTaskJob
> 2016-05-09 12:53:44.739355 2016-05-09 12:54:54.810579 2016-05-09
> 12:53:44.739575 LocalExecutor 0a8613c2b5d2 root 17831 dag_b
> 2016-04-07 16:00:00.000000 running scheduled__2016-04-07T16:00:00 false
> 2016-05-09 12:51:55.078050
> {code}
> I'm getting the following exception which seems to be halting the scheduler
> just before it could queue the tasks for retrying:
> {code}
> [2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at
> 0x7f48a6b87550> is not bound to a Session; attribute refresh operation cannot
> proceed
> Traceback (most recent call last):
> File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in
> _do_dags
> self.process_dag(dag, tis_out)
> File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in
> process_dag
> active_runs = dag.get_active_runs()
> File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731,
> in get_active_runs
> active_dates.append(run.execution_date)
> File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
> line 237, in __get__
> return self.impl.get(instance_state(instance), dict_)
> File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py",
> line 578, in get
> value = state._load_expired(state, passive)
> File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line
> 474, in _load_expired
> self.manager.deferred_scalar_loader(self, toload)
> File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py",
> line 610, in load_scalar_attributes
> (state_str(state)))
> sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550>
> is not bound to a Session; attribute refresh operation cannot proceed
> {code}
> I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls
> which have a commit at the end; after doing this there's no exceptions and
> the tasks are getting retried correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)