[h2] Re: NOT IN taking lot of time

Roger Thomas Wed, 16 Apr 2014 03:38:26 -0700

This will take a long time as the delete will be done under transaction 
control and so there will be a lot of I/O work, even on my little test 
system this took 8s and that is for the smallest possible table size, with 
only the primary key index. For your environment this may take far longer 
if the data is held in a large database and/or the cache is smaller than 
the file space used to hold the table and its indexes.


Do you need to do this real time, or can it be done as a background task 
where you delete say 1000 records at a time? This will not speed things up, 
but will stop such a large single task being passed to the database engine.

Roger


On Wednesday, April 16, 2014 5:39:23 AM UTC+1, Devesh Mishra wrote:
>
> Thanks Roger for the quick reply. The query indeed is an inner query which 
> is supplied to a DELETE. Essentially, I am trying to DELETE ~1 million 
> records. This causes a DB hang or may be takes a lot of time. Following is 
> the query.
>
> DELETE
> FROM table1
> WHERE ID NOT IN
>     (SELECT id
>      FROM ref_table
>      WHERE id2 IN
>          (SELECT distinct(id2)
>           FROM another_ref_table))
>   AND CREATEDDATE < current_timestamp()-1
>
> I think deletion is taking lot of time, can we optimize this.
>
>
> On Tuesday, April 15, 2014 11:24:47 PM UTC+5:30, Roger Thomas wrote:
>>
>> Well even with the best optimizer in the world this is likely to result 
>> in a complete table scan of table1, with each record then being checked 
>> against the results generated by the inner NOT IN select. 
>>
>> When I do a simple test of having 2 tables in a clean database with just 
>> ID fields populated 1-1M and 1-100 this query takes 2.3s to run if I just 
>> request a count(*) back rather than the 999900 records that the original 
>> query will return. For me that 2.3s is basically disk I/O time, if you have 
>> records with more information a lot more information will be requested from 
>> the disk and so this will take longer. 
>>
>> Try running the following at a h2 console after having restarted H2 to 
>> clear the cache and you will see the retrival time - if your cache is large 
>> enough the second time you run it should result in a near zero time taken.
>>
>> select count(*) from 
>> (
>>    select  * from table1 
>>    where 
>>    ID 
>>    NOT IN 
>>    (
>>       select ID from table2 
>>    )
>> ) 
>>
>> The rest of the time will be the time spent dealing with the result set - 
>> can you move addtional processing to the database so ending up with a 
>> smaller result set?
>>
>> Roger
>>
>> On Tuesday, April 15, 2014 3:39:34 PM UTC+1, Devesh Mishra wrote:
>>>
>>> Hi,
>>>
>>> I am having performance problems while executing a query which has NOT 
>>> IN clause. Following is the query
>>>
>>> select  * from table1
>>> where 
>>> ID 
>>> NOT IN 
>>> (
>>> select ID from table2 
>>> ) 
>>>
>>> Table1 contains 1 million records
>>> table2 contains 100 records.
>>>
>>> expecting 1 million - 100 records as output. Both the columns are 
>>> indexed.
>>>
>>> Please help, Thanks a lot in advance
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] Re: NOT IN taking lot of time

Reply via email to