Re: [GENERAL] How to speed up delete where not in

Melvin Davidson Sat, 26 Sep 2015 06:45:55 -0700

In addition to the previous recommendation, make sure you have an index on
dokumnr in table omrid.


EG:
CREATE INDEX omrid_dokumnr_fk ON omrid
  USING BTREE (dokumnr);



On Sat, Sep 26, 2015 at 7:33 AM, David Rowley <[email protected]>
wrote:

> On 26 September 2015 at 19:53, Andrus <[email protected]> wrote:
>
>> Hi!
>>
>>
>> I'm looking for a way to delete records which do not have child rows on
>> big tables where lot of rows needs to be deleted. Both tables have lot of
>> other foreign key references.
>>
>>
>> Document headers are in omdok table:
>>
>>    create table omdok ( dokumnr serial primary key, ... );
>>
>> Document rows are in omrid table
>>
>>    CREATE TABLE omrid
>>    (
>>      id serial NOT NULL,
>>      reanr serial NOT NULL,
>>      dokumnr integer NOT NULL,
>>      CONSTRAINT omrid_pkey PRIMARY KEY (id),
>>      CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr)
>>          REFERENCES omdok (dokumnr) MATCH SIMPLE
>>          ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY
>> IMMEDIATE,
>>      ....
>>    );
>>
>>
>> I tried
>>
>>    delete from omdok where dokumnr not in      (select dokumnr from omrid)
>>
>> Query it is running currently 15 hours and is still running.
>> postgres.exe is using 50% CPU  all the time (this is 2 core CPU).
>>
>>    explain delete from omdok where dokumnr not in      (select dokumnr
>> from omrid)
>>
>> returns:
>>
>>    "Delete  (cost=0.00..21971079433.34 rows=220815 width=6)"
>>    "  ->  Seq Scan on omdok  (cost=0.00..21971079433.34 rows=220815
>> width=6)"
>>    "        Filter: (NOT (SubPlan 1))"
>>    "        SubPlan 1"
>>    "          ->  Materialize  (cost=0.00..94756.92 rows=1897261 width=4)"
>>    "                ->  Seq Scan on omrid  (cost=0.00..77858.61
>> rows=1897261 width=4)"
>>
>> - How to delete parents without child rows fast?
>>
>
> if you write the query as with a NOT EXISTS, rather than a NOT IN() it
> should complete much more quickly.
>
> It's important to know that the semantics of NOT IN are likely not at all
> what you think:
>
> For example, in the following query, would you expect it to return the row
> with a.a = 3 ?
>
> select * from (values(1),(2),(3)) a(a) WHERE a NOT IN(SELECT b FROM
> (VALUES(NULL),(1),(2)) b(b));
>
> The presence of NULL causes this to not behave the way you might think,
> yet it works exactly the way the SQL standard dictates.
>
> You could think of this as equivalent to writing:
>
> select * from (values(1),(2),(3)) a(a) WHERE a.a <> NULL AND a.a <> 1 AND
> a.a <> 2;
>
> And since a.a <> NULL is 'unknown', this causes the entire WHERE clause to
> be false, therefore nothing matches.
>
> For this reason PostgreSQL does not optimise NOT IN() the same way as it
> optimises NOT EXISTS().
>
> If you write the query as:
>
> delete from omdok where not exists(select 1 from omrid where omdok.dokumnr
> = omrid.dokumnr);
>
> then you might see it execute in a few seconds. Perhaps you should
> consider cancelling the current query, perhaps perform a VACUUM on omdoc
> after cancelling, and then run the NOT EXISTS version instead.
>
> I mocked up your tables locally, and populated them with the same number
> of records as your row estimates in the EXPLAIN you pasted and I got:
>
> test=# delete from omdok where not exists (select 1 from omrid where
> omdok.dokumn = omrid.dokumnr);
> DELETE 0
> Time: 1698.233 ms
>
> Whereas with the NOT IN() I cancelled it after 10 minutes.
>
> Regards
>
> David Rowley
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
> <http://www.2ndquadrant.com/>
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] How to speed up delete where not in

Reply via email to