James Meickle created AIRFLOW-3219:
--------------------------------------

             Summary: Graph displays are non-deterministic
                 Key: AIRFLOW-3219
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3219
             Project: Apache Airflow
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: James Meickle


In Airflow, tasks are stored in a dictionary (self.task_dict). This dictionary 
is unsorted. The values in the dictionary - also unsorted - are used for the 
task list (self.tasks 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3568).
 Therefore, the list of tasks is unsorted. This has a variety of downstream 
impacts, such as Airflow's topological sort using this unsorted list to produce 
a topo-sorted order.

As a consequence of Airflow task list order being based on Python RNG, the 
returned order will be reshuffled whenever the server restarts (different seed 
value). Consequently, Airflow sorts are not stable across restarts. This is 
irritating in the case of graph layouts in particular because a server restart 
can result in graphs appearing differently even though there has been no code 
ship.

We should consider storing tasks in an OrderedDict or some other structure that 
isn't randomly sorted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to