Taragolis commented on issue #33647:
URL: https://github.com/apache/airflow/issues/33647#issuecomment-1717288303
> Those are a bit guesses - maybe @Taragolis who have done a bit more
analysis can also confirm if my thinking is right.
To be honest I've had a look after I found this issue initially and I was
lying in bed and check code thought browser on iPad and just forgot to write a
message. That mean all findings need to be verified first, I assume that we use
this approach:
- It works in most cases
- We do not have triggerer states in DB, maybe for some optimisation reason.
The problem also that we operate with `set` on client side (Airflow) for ids
before send to DB backend and even similar queries might be not so similar for
DB. But this my assumption.
> we could also add hinting to the query
I like a position of some postgres-vendor developer about hint, something
like "Maybe we want to have a hints in vanilla postgres, but not by same way it
implemented in Oracle but in our product we need implements some close related
stuff to make people who migrate from OracleDB to our product". In general it
comes from the fact that statistic in most cases better when especially if it
comes to the COB (Cost Base Optimisation) or next-gen of COB
The problem with hint that it fix "Here and Now" and it might work in
particular this case, with particular this amount data, particular this
indexes, particular this amount of memory, for particular this user and as soon
as some of parameters changes the things could become worser or not improve if
this hints not exists.
This is just my personal position: "Query hint it is a solution of last
resort after you try all other last resort solutions"
> Sorry for confusion, we use mysql version 8.0.28.
That is nice.
> For now we run analyze command if we see there is some issue.
@shubhransh-eb I'm not an expert on MySQL but is any configuration exists
which might potentially turn on/off auto gathering table statistic (aka
ANALYZE)? Or it maybe by design you should manually run ANALYZE time to time.
If compare to Postgres I know exactly that autoanalyze daemon run in
background and if user turned off then high intensive workloads query become
slower over time. But even with postgres autoanalyze daemon in some cases
better manually run AMALYZE TABLE especially after huge delete + insert
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]