gmsantos opened a new pull request, #23625:
URL: https://github.com/apache/airflow/pull/23625

   Airflow 2.3.0 was introduced by default the [warn deployment exposure 
feature](https://github.com/apache/airflow/pull/18557) in the webserver by 
inspecting the log table for robots events.
   
   This query could be costly in deployments with large log tables.
   Adding an index in the event column improves this query performance.
   
   Here is an example of the generated query in a log table with around ~33 
million rows:
   
   ```sql
   SELECT count(*) AS count_1 
   FROM (SELECT log.id AS log_id, log.dttm AS log_dttm, log.dag_id AS 
log_dag_id, log.task_id AS log_task_id, log.map_index AS log_map_index, 
log.event AS log_event, log.execution_date AS log_execution_date, log.owner AS 
log_owner, log.extra AS log_extra 
   FROM log 
   WHERE log.event = 'robots' AND log.dttm > 
'2022-05-03T15:11:03.381511+00:00'::timestamptz) AS anon_1;
   ```
   
   Before applying the index: **3.386s**
   
   <img width="818" alt="image" 
src="https://user-images.githubusercontent.com/1991286/167680317-3d7f6f3b-3e77-4b40-945f-f4ecae2d9798.png";>
   
   After applying the index: **0.109s**
   
   <img width="816" alt="image" 
src="https://user-images.githubusercontent.com/1991286/167680386-92457c5e-e007-4703-8e0f-9df1a1dc56bb.png";>
   
   Also, the execution plan in a PostgreSQL database:
   
   **Before**
   
   <img width="684" alt="image" 
src="https://user-images.githubusercontent.com/1991286/167680799-fe9fc1bb-b712-4b7b-b382-0916a0420f75.png";>
   
   **After**
   
   <img width="569" alt="image" 
src="https://user-images.githubusercontent.com/1991286/167680862-6ce485db-d8b5-4c94-9a8f-81df3e7e9b3e.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to