[ 
https://issues.apache.org/jira/browse/AIRFLOW-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278028#comment-15278028
 ] 

Bence Nagy commented on AIRFLOW-92:
-----------------------------------

While I agree that a design change is needed here, I don't think this tells us 
the full story:

{quote}
in TI.are_dependencies_met a session.commit is called to make sure that in 
evaluate_trigger_rule a state change will be saved
{quote}

{{evaluate_trigger_rule}} has a kwarg called {{flag_upstream_failed}} which is 
set to False in {{get_active_runs}}. This means that in the context where 
committing can cause this error, the commit is actually totally unnecessary 
since state changes are disabled anyway, and a database update can never happen.

> Tasks not being retried at all due to a 'obj not bound to a Session' exception
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-92
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-92
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: EC2 t2.medium instance, 
> Docker `version 1.11.1, build 5604cbe`,
> Host is `Linux ip-172-31-44-140 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 
> 20:50:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux`,
> Docker containers are built upon the `python:3.5` image, 
> LocalExecutor is used with two scheduler containers running
>            Reporter: Bence Nagy
>            Priority: Critical
>
> I have some tasks that are stuck in {{up_for_retry}} state, below is an 
> extract from the database. (here it is in a [Google Drive 
> spreadsheet|https://docs.google.com/spreadsheets/d/14dtb3zYa583V1SaLcpOq6hDM4ThCeN7JhHjftRwKxbI/edit?usp=sharing]
>  with better formatting)
> {code}
> task_id       dag_id  execution_date  start_date      end_date        
> duration        state   try_number      hostname        unixname        
> job_id  pool    queue   priority_weight operator        queued_dttm     id    
>   dag_id  state   job_type        start_date      end_date        
> latest_heartbeat        executor_class  hostname        unixname        id    
>   dag_id  execution_date  state   run_id  external_trigger        conf    
> end_date        start_date
> task_a        dag_a1  2016-05-09 08:00:00.000000      2016-05-09 
> 12:00:12.382775      2016-05-09 12:01:12.473914      60.091139       
> up_for_retry    1       d5593c115c22    root    46266           default 4     
>   ExternalTaskSensor              46266           success LocalTaskJob    
> 2016-05-09 12:00:08.195711      2016-05-09 12:01:13.261937      2016-05-09 
> 12:00:08.195732      LocalExecutor   d5593c115c22    root    17799   dag_a1  
> 2016-05-09 08:00:00.000000      failed  scheduled__2016-05-09T08:00:00  false 
>                   2016-05-09 12:00:04.406875
> task_a        dag_a2  2016-05-09 10:00:00.000000      2016-05-09 
> 12:00:13.102094      2016-05-09 12:01:13.185960      60.083866       
> up_for_retry    1       d5593c115c22    root    46270           default 4     
>   ExternalTaskSensor              46270           success LocalTaskJob    
> 2016-05-09 12:00:08.896527      2016-05-09 12:01:13.960936      2016-05-09 
> 12:00:08.896550      LocalExecutor   d5593c115c22    root    17800   dag_a2  
> 2016-05-09 10:00:00.000000      failed  scheduled__2016-05-09T10:00:00  false 
>                   2016-05-09 12:00:04.531888
> task_b        dag_b   2016-04-07 18:00:00.000000      2016-05-09 
> 12:53:59.990395      2016-05-09 12:54:00.393259      0.402864        
> up_for_retry    1       0a8613c2b5d2    root    46366           default 1     
>   PostgresOperator                46366           success LocalTaskJob    
> 2016-05-09 12:53:58.881987      2016-05-09 12:54:03.891450      2016-05-09 
> 12:53:58.882006      LocalExecutor   0a8613c2b5d2    root    17836   dag_b   
> 2016-04-07 18:00:00.000000      running scheduled__2016-04-07T18:00:00  false 
>                   2016-05-09 12:51:59.713718
> task_c        dag_b   2016-04-07 16:00:00.000000      2016-05-09 
> 12:53:49.822634      2016-05-09 12:54:49.924291      60.101657       
> up_for_retry    1       0a8613c2b5d2    root    46359           default 2     
>   ExternalTaskSensor              46359           success LocalTaskJob    
> 2016-05-09 12:53:44.739355      2016-05-09 12:54:54.810579      2016-05-09 
> 12:53:44.739575      LocalExecutor   0a8613c2b5d2    root    17831   dag_b   
> 2016-04-07 16:00:00.000000      running scheduled__2016-04-07T16:00:00  false 
>                   2016-05-09 12:51:55.078050
> {code}
> I'm getting the following exception which seems to be halting the scheduler 
> just before it could queue the tasks for retrying:
> {code}
> [2016-05-10 09:42:33,562] {jobs.py:706} ERROR - Instance <DagRun at 
> 0x7f48a6b87550> is not bound to a Session; attribute refresh operation cannot 
> proceed
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 703, in 
> _do_dags
>     self.process_dag(dag, tis_out)
>   File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 507, in 
> process_dag
>     active_runs = dag.get_active_runs()
>   File "/usr/local/lib/python3.5/site-packages/airflow/models.py", line 2731, 
> in get_active_runs
>     active_dates.append(run.execution_date)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", 
> line 237, in __get__
>     return self.impl.get(instance_state(instance), dict_)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/attributes.py", 
> line 578, in get
>     value = state._load_expired(state, passive)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/state.py", line 
> 474, in _load_expired
>     self.manager.deferred_scalar_loader(self, toload)
>   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", 
> line 610, in load_scalar_attributes
>     (state_str(state)))
> sqlalchemy.orm.exc.DetachedInstanceError: Instance <DagRun at 0x7f48a6b87550> 
> is not bound to a Session; attribute refresh operation cannot proceed
> {code}
> I've managed to fix this by removing all {{ti.are_dependencies_met()}} calls 
> which have a commit at the end; after doing this there's no exceptions and 
> the tasks are getting retried correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to