Well even with the best optimizer in the world this is likely to result in
a complete table scan of table1, with each record then being checked
against the results generated by the inner NOT IN select.
When I do a simple test of having 2 tables in a clean database with just ID
fields populated 1-1M and 1-100 this query takes 2.3s to run if I just
request a count(*) back rather than the 999900 records that the original
query will return. For me that 2.3s is basically disk I/O time, if you have
records with more information a lot more information will be requested from
the disk and so this will take longer.
Try running the following at a h2 console after having restarted H2 to
clear the cache and you will see the retrival time - if your cache is large
enough the second time you run it should result in a near zero time taken.
select count(*) from
(
select * from table1
where
ID
NOT IN
(
select ID from table2
)
)
The rest of the time will be the time spent dealing with the result set -
can you move addtional processing to the database so ending up with a
smaller result set?
Roger
On Tuesday, April 15, 2014 3:39:34 PM UTC+1, Devesh Mishra wrote:
>
> Hi,
>
> I am having performance problems while executing a query which has NOT IN
> clause. Following is the query
>
> select * from table1
> where
> ID
> NOT IN
> (
> select ID from table2
> )
>
> Table1 contains 1 million records
> table2 contains 100 records.
>
> expecting 1 million - 100 records as output. Both the columns are indexed.
>
> Please help, Thanks a lot in advance
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.