James Meickle created AIRFLOW-3219:
--------------------------------------
Summary: Graph displays are non-deterministic
Key: AIRFLOW-3219
URL: https://issues.apache.org/jira/browse/AIRFLOW-3219
Project: Apache Airflow
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: James Meickle
In Airflow, tasks are stored in a dictionary (self.task_dict). This dictionary
is unsorted. The values in the dictionary - also unsorted - are used for the
task list (self.tasks
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3568).
Therefore, the list of tasks is unsorted. This has a variety of downstream
impacts, such as Airflow's topological sort using this unsorted list to produce
a topo-sorted order.
As a consequence of Airflow task list order being based on Python RNG, the
returned order will be reshuffled whenever the server restarts (different seed
value). Consequently, Airflow sorts are not stable across restarts. This is
irritating in the case of graph layouts in particular because a server restart
can result in graphs appearing differently even though there has been no code
ship.
We should consider storing tasks in an OrderedDict or some other structure that
isn't randomly sorted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)