The GitHub Actions job "CodeQL" on airflow.git has failed.
Run started by GitHub user ashb (triggered by ashb).

Head commit for run:
451a6f4d9ff8b744075e2f25099046c77f28179e / Ash Berlin-Taylor <[email protected]>
Speed up grid_data endpoint by 10x (#24284)

* Speed up grid_data endpoint by 10x

These changes make the endpoint go from almost 20s down to 1.5s and the
changes are two fold:

1. Keep datetimes as objects for as long as possible

   Previously we were converting start/end dates for a task group to a
   string, and then in the parent parsing it back to a datetime to find
   the min and max of all the child nodes.

   The fix for that was to leave it as a datetime (or a
   pendulum.DateTime technically) and use the existing
   `AirflowJsonEncoder` class to "correctly" encode these objects on
   output.

2. Reduce the number of DB queries from 1 per task to 1.

   The removed `get_task_summaries` function was called for each task,
   and was making a query to the database to find info for the given
   DagRuns.

   The helper function now makes just a single DB query for all
   tasks/runs and constructs a dict to efficiently look up the ti by
   run_id.

* Add support for mapped tasks in the grid data

* Don't fail when not all tasks have a finish date.

Note that this possibly has incorrect behaviour, in that the end_date of
a TaskGroup is set to the max of all the children's end dates, even if
some are still running. (This is the existing behaviour and is not
changed or altered by this change - limiting it to just performance
fixes)

Report URL: https://github.com/apache/airflow/actions/runs/2502046657

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to