Taragolis commented on issue #33647: URL: https://github.com/apache/airflow/issues/33647#issuecomment-1716320225
First of all I would recommend considering possibility of upgrading to a new version of MySQL, [5.7 it is almost EOL](https://endoflife.date/mysql), even if Amazon would support MySQL 5.7 on Aurora there is big chance that Airflow would stop support MySQL 5.7 in versions which released after **31 Oct 2023**. That mean that further improvements in triggerer would not available. In additional 8.0 should provide better query analyser/planner. Just make sure that you test migration on snapshot of DB before doing this on prod database. --- Anyway, I inspected data transfers between Triggerer and TriggerJob, it might help someone (maybe it was me) who want to optimise this: 1. [Load Triggers](https://github.com/apache/airflow/blob/8918b435be8c683bbd6bb2ffa871dbd31d476f48/airflow/jobs/triggerer_job_runner.py#L374-L378) 2. [All associated IDs with current Triggerer](https://github.com/apache/airflow/blob/87b08ad0840a11d8cd5c0b5043d3a341b1a8f258/airflow/models/trigger.py#L200) 3. [Update Triggers](https://github.com/apache/airflow/blob/8918b435be8c683bbd6bb2ffa871dbd31d476f48/airflow/jobs/triggerer_job_runner.py#L641) 4. [Bulk Load Triggers](https://github.com/apache/airflow/blob/87b08ad0840a11d8cd5c0b5043d3a341b1a8f258/airflow/models/trigger.py#L99-L110) - Query which might make a problem in case of huge input dataset 5. Put data in different different dequeue Seems like 1-4 might be executed in one query with additional overhead on captured data but it might reduce time to execute on DB side, however required additional filtration on client (Airflow) side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
