shubhransh-eb commented on issue #33647:
URL: https://github.com/apache/airflow/issues/33647#issuecomment-1700354297

   > > Looks like an index hint should be needed or smth like that. Very 
interesting one. I will mark it for 2.7.1 hoping maybe someone will have time 
to fix it before
   > 
   > To be honest better have a rule not to use `IN` with any potential big 
dataset. It really makes most RDBMS unhappy.
   > 
   > For example in Postgres everything in `IN` become part of execution plan, 
and if it quite a big, then DB spend most of the time for parse, trying build 
multiple different plans, calculate costs over a lot of different and in the 
end chouse 'lets take something', and time spend for this analyze might be 
greater than even do FULL SEQ SCAN over couple of tables.
   > 
   > In general better to get rid of non constant sized IN filters (couple 
statuses for tasks and dags) and replace by other methods:
   > 
   > * [NOT] EXISTS, for SEMI-ANTI Joins over subqueries
   > * JOIN over VALUES, in this case execution plans shouldn't be crazy, it 
should supported in PG, MySQL8 and MsSQL (RIP), maybe something similar exists 
for SQLite
   > * Regular Joins :D
   > 
   > @shubhransh-eb I guess you use MySQL backend? If so, I wonder which 
version?
   
   Hello ,
   we are using `Aurora MySQL`
   Engine Version: `5.7.mysql_aurora.2.11.2`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to