Taragolis commented on issue #33647:
URL: https://github.com/apache/airflow/issues/33647#issuecomment-1718111130

   > I am not an expert in MySQL either, but from what I found out on internet, 
I dont think MySQL automatically runs analyze command, that could be the reason 
why we have to manually run it to make this work.
   
   I think MySQL should run something to gathering statistics without it hardly 
possible to calculate costs for the queries. Manual ANALYZE in this case 
something like: "Forget everything you know about the table and collect new 
statistics". 
   If you don't know probably you don't know how to turn it off, I've asked 
because it was quite popular to turn-off auto-vacuum daemon in Postgres years 
ago and got all side effects 🤣 
   
   > this issue happen when we have around 150 sensors start around same time
   
   That also could be a reason. Again not expert of MySQL and no idea how it 
handle multiple simultaneous connection. In Postgres it is quite expensive 
operation time+memory. So base recommendation it was use some connection pooler 
between DB and AIrflow, internally Airflow use SQLaclhemy pooler but it limited 
by single process, so better have something bettween since you use Managed 
MySQL on AWS, you might try to use [RDS 
Proxy](https://aws.amazon.com/rds/proxy/)
   
   ---
   
   And last but not least also might be a reason fact that most of the 
deferrable operators not truly async, especially something like 
[TaskStateTrigger](https://github.com/apache/airflow/blob/8918b435be8c683bbd6bb2ffa871dbd31d476f48/airflow/triggers/external_task.py#L39),
 which might kept session for very long time and prevent gathering statistic 
from database, it should not be a problem on Postgres, but who know maybe this 
is a problem for MySQL. This one my assumption.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to