On Montag, 27. August 2007 Paul J Stevens wrote:
> Then why do I keep hearing noises on this list that JOINs are faster
> than sub-SELECTS ??

1) It still has to be done in the correct search order to be fast
2) I've given lots of temporary mem space to PostgreSQL, that should 
help with temp tables
3) My numbers are for PostgreSQL 8.1.9 only, on my hardware, with my 
database as it is. YMMV, that's why it would be nice to see other 
stats. I currently have 
count(*) from dbmail_headervalue: 4177750
count(*) from dbmail_physmessage: 116015

4) It seems that only the "sort" that's done with the LEFT JOIN that 
kills performance. There would be an index in the correct order 
(dbmail_headervalue_physmessage_id_fkey), but I do not know why it's 
not used.

5) I've experimented now, and changed one parameter in postgresql.conf:
default_statistics_target = 1000    # was 10 before

Afterwards I made VACUUM ANALYZE, now look at the difference:

EXPLAIN ANALYZE SELECT p.id from dbmail_physmessage p where p.id NOT IN 
(select physmessage_id from dbmail_headervalue);
                                                               QUERY 
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on dbmail_physmessage p  (cost=128546.48..130988.81 rows=58014 
width=8) (actual time=29485.098..29703.340 rows=15 loops=1)
   Filter: (NOT (hashed subplan))
   SubPlan
     ->  Seq Scan on dbmail_headervalue  (cost=0.00..118101.38 
rows=4178038 width=8) (actual time=0.051..15904.147 rows=4178057 
loops=1)
 Total runtime: 29717.718 ms
(5 Zeilen)

explain ANALYZE SELECT p.id FROM dbmail_physmessage p LEFT JOIN 
dbmail_headervalue h ON p.id = h.physmessage_id WHERE h.physmessage_id 
IS NULL;
                                                                              
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..262999.04 rows=116027 width=8) (actual 
time=4305.679..37126.298 rows=15 loops=1)
   Merge Cond: ("outer".id = "inner".physmessage_id)
   Filter: ("inner".physmessage_id IS NULL)
   ->  Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p  
(cost=0.00..4053.94 rows=116027 width=8) (actual time=0.013..467.835 
rows=116029 loops=1)
   ->  Index Scan using dbmail_headervalue_1 on dbmail_headervalue h  
(cost=0.00..206429.55 rows=4178038 width=8) (actual 
time=8.833..24189.424 rows=4178082 loops=1)
 Total runtime: 37126.564 ms

Now I'd say there's nothing to be changed with the query, but a higher 
default_statistics_target with vacuum analyze help ;-)

mfg zmi
-- 
// Michael Monnerie, Ing.BSc.
----------------------------------
Sorcerers have their magic wands:
  powerful, potentially dangerous tools with a life of their own.
Witches have their familiars:
  creatures disguised as household beasts that could,
  if they choose, wreak the witches' havoc.
Mystics have their golems:
  beings built of wood and tin brought to life to do their
  masters' bidding.
I have Linux.
----------------------------------
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: EA39 8918 EDFF 0A68 ACFB  11B7 BA2D 060F 1C6F E6B0
// Keyserver: www.keyserver.net                   Key-ID: 1C6FE6B0
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to