Taragolis commented on issue #33647:
URL: https://github.com/apache/airflow/issues/33647#issuecomment-1717288303

   > Those are a bit guesses - maybe @Taragolis who have done a bit more 
analysis can also confirm if my thinking is right.
   
   To be honest I've had a look after I found this issue initially and I was 
lying in bed and check code thought browser on iPad and just forgot to write a 
message. That mean all findings need to be verified first, I assume that we use 
this approach:
   - It works in most cases
   - We do not have triggerer states in DB, maybe for some optimisation reason.
   
   The problem also that we operate with `set` on client side (Airflow) for ids 
before send to DB backend and even similar queries might be not so similar for 
DB. But this my assumption.
   
   > we could also add hinting to the query
   
   I like a position of some postgres-vendor developer about hint, something 
like "Maybe we want to have a hints in vanilla postgres, but not by same way it 
implemented in Oracle but in our product we need implements some close related 
stuff to make people who migrate from OracleDB to our product". In general it 
comes from the fact that statistic in most cases better when especially if it 
comes to the  COB (Cost Base Optimisation) or next-gen of COB
   
   The problem with hint that it fix "Here and Now" and it might work in 
particular this case, with particular this amount data, particular this 
indexes, particular this amount of memory, for particular this user and as soon 
as some of parameters changes the things could become worser or not improve if 
this hints not exists.
   
   This is just my personal position: "Query hint it is a solution of last 
resort after you try all other last resort solutions"
   
   > Sorry for confusion, we use mysql version 8.0.28.
   
   That is nice. 
   
   > For now we run analyze command if we see there is some issue.
   
   @shubhransh-eb I'm not an expert on MySQL but is any configuration exists 
which might potentially turn on/off auto gathering table statistic (aka 
ANALYZE)? Or it maybe by design you should manually run ANALYZE time to time.
    
   If compare to Postgres I know exactly that autoanalyze daemon run in 
background and if user turned off then high intensive workloads query become 
slower over time. But even with postgres autoanalyze daemon in some cases 
better manually run AMALYZE TABLE especially after huge delete + insert


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to