[
https://issues.apache.org/jira/browse/AIRFLOW-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarek Potiuk updated AIRFLOW-3797:
----------------------------------
Labels: gsoc gsoc2020 mentor (was: )
> Improve performance of cc1e65623dc7_add_max_tries_column_to_task_instance
> migration
> -----------------------------------------------------------------------------------
>
> Key: AIRFLOW-3797
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3797
> Project: Apache Airflow
> Issue Type: Improvement
> Reporter: Bas Harenslak
> Priority: Major
> Labels: gsoc, gsoc2020, mentor
>
> The cc1e65623dc7_add_max_tries_column_to_task_instance migration creates a
> DagBag for the corresponding DAG for every single task instance. This is very
> redundant and not necessary.
> Hence, there are discussions on Slack like these:
> {noformat}
> murquizo [Jan 17th at 1:33 AM]
> Why does the airflow upgradedb command loop through all of the dags?
> ....
> murquizo [14 days ago]
> NICE, @BasPH! that is exactly the migration that I was referring to. We have
> about 600k task instances and have a several python files that generate
> multiple DAGs, so looping through all of the task_instances to update
> max_tries was too slow. It took 3 hours and didnt even complete! i pulled
> the plug and manually executed the migration. Thanks for your response.
> {noformat}
> An easy to accomplish improvement is to parse a DAG only once and after that
> set the task instance try_number. I created a branch for it
> (https://github.com/BasPH/incubator-airflow/tree/bash-optimise-db-upgrade),
> currently running tests and will make PR when done.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)