It depends on the data shape.

In the first case, using a correlated subquery, the outer table query is 
processed applying all applicable joins and where conditions, and if and only 
if the row is still a candidate is the correlated subquery performed.  If it 
succeeds the result row is output.  For maximum efficiency you need indexes 
covering (at least) the correlated columns.

In the second case, the subselect to generate the list for in is executed first 
and the results put in a temporary b-tree.  Then either (a) this list is 
traversed to find all the initial candidates in the outer query if the 
appropriate indexes exist, else the outer query is performed checking whether 
the outer candidate row is in the result set from the inner subquery.

Assuming that proper indexes exist the first form (correlated subquery) will 
almost always be quicker since it will probabilistically requires less 
operations in all cases.

In the case where tableb's size closely matches the set of candidates from the 
outer query, the two will execute at the same speed.  As the size of tableb 
increases (becomes less selective) the speed of the second form will decrease.

Take the case where tablea has a billion rows, and tableb has a billion rows, 
and the intersection is five rows.  Which WHERE condition would you expect to 
be quicker?  Now you may think that a given application will only ever have 5 
rows in tableb (or only has five rows during testing) and therefore the choice 
doesn't matter.  

Sometimes this is an incorrect assumption and this is what leads to the 
"billion dollars spent on a new system that does not work and must be thrown in 
the dust bin" stories in the news.

> -----Original Message-----
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Dave Blake
> Sent: Sunday, 16 October, 2016 00:58
> To: SQLite mailing list
> Subject: [sqlite] IN verses EXISTS Query Speed
> 
> Some simple testing is showing using an EXISTS statement is generally
> quicker then using an IN
> 
> e.g.
> 
> SELECT * FROM tablea
> WHERE EXISTS (SELECT 1 FROM tableb WHERE tablea.id  = tableb.id AND ...)
> 
> is quicker than
> SELECT * FROM tablea
> WHERE tablea.id IN  (SELECT tableb.id FROM tableb WHERE ...)
> 
> Is there any reason for this to be always true in SQLite, or is it query
> dependant?
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to