In addition to the previous recommendation, make sure you have an index on dokumnr in table omrid.
EG: CREATE INDEX omrid_dokumnr_fk ON omrid USING BTREE (dokumnr); On Sat, Sep 26, 2015 at 7:33 AM, David Rowley <david.row...@2ndquadrant.com> wrote: > On 26 September 2015 at 19:53, Andrus <kobrule...@hot.ee> wrote: > >> Hi! >> >> >> I'm looking for a way to delete records which do not have child rows on >> big tables where lot of rows needs to be deleted. Both tables have lot of >> other foreign key references. >> >> >> Document headers are in omdok table: >> >> create table omdok ( dokumnr serial primary key, ... ); >> >> Document rows are in omrid table >> >> CREATE TABLE omrid >> ( >> id serial NOT NULL, >> reanr serial NOT NULL, >> dokumnr integer NOT NULL, >> CONSTRAINT omrid_pkey PRIMARY KEY (id), >> CONSTRAINT omrid_dokumnr_fkey FOREIGN KEY (dokumnr) >> REFERENCES omdok (dokumnr) MATCH SIMPLE >> ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY >> IMMEDIATE, >> .... >> ); >> >> >> I tried >> >> delete from omdok where dokumnr not in (select dokumnr from omrid) >> >> Query it is running currently 15 hours and is still running. >> postgres.exe is using 50% CPU all the time (this is 2 core CPU). >> >> explain delete from omdok where dokumnr not in (select dokumnr >> from omrid) >> >> returns: >> >> "Delete (cost=0.00..21971079433.34 rows=220815 width=6)" >> " -> Seq Scan on omdok (cost=0.00..21971079433.34 rows=220815 >> width=6)" >> " Filter: (NOT (SubPlan 1))" >> " SubPlan 1" >> " -> Materialize (cost=0.00..94756.92 rows=1897261 width=4)" >> " -> Seq Scan on omrid (cost=0.00..77858.61 >> rows=1897261 width=4)" >> >> - How to delete parents without child rows fast? >> > > if you write the query as with a NOT EXISTS, rather than a NOT IN() it > should complete much more quickly. > > It's important to know that the semantics of NOT IN are likely not at all > what you think: > > For example, in the following query, would you expect it to return the row > with a.a = 3 ? > > select * from (values(1),(2),(3)) a(a) WHERE a NOT IN(SELECT b FROM > (VALUES(NULL),(1),(2)) b(b)); > > The presence of NULL causes this to not behave the way you might think, > yet it works exactly the way the SQL standard dictates. > > You could think of this as equivalent to writing: > > select * from (values(1),(2),(3)) a(a) WHERE a.a <> NULL AND a.a <> 1 AND > a.a <> 2; > > And since a.a <> NULL is 'unknown', this causes the entire WHERE clause to > be false, therefore nothing matches. > > For this reason PostgreSQL does not optimise NOT IN() the same way as it > optimises NOT EXISTS(). > > If you write the query as: > > delete from omdok where not exists(select 1 from omrid where omdok.dokumnr > = omrid.dokumnr); > > then you might see it execute in a few seconds. Perhaps you should > consider cancelling the current query, perhaps perform a VACUUM on omdoc > after cancelling, and then run the NOT EXISTS version instead. > > I mocked up your tables locally, and populated them with the same number > of records as your row estimates in the EXPLAIN you pasted and I got: > > test=# delete from omdok where not exists (select 1 from omrid where > omdok.dokumn = omrid.dokumnr); > DELETE 0 > Time: 1698.233 ms > > Whereas with the NOT IN() I cancelled it after 10 minutes. > > Regards > > David Rowley > > -- > David Rowley http://www.2ndQuadrant.com/ > <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Training & Services > > -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.