Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Richard Huxton

Roman Neuhauser wrote:

Why does the planner want to crawl the table that has 5M rows instead of the one
with 176k rows? Both tables are freshly vacuum-full-analyzed.


Because you don't have an index on base for the files table.


callrec32=# \d fix.files
  Table fix.files
 Column |  Type  | Modifiers
++---
 dir| character varying(255) |
 base   | character varying(255) |
Indexes:
base_storename_idx btree (base, dir)::text || '/'::text) || 
(base)::text)))
ff_storename_idx btree (dir)::text || '/'::text) || (base)::text)))


A couple of indexes, but none simple on base, so it can't be used for 
the join.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Roman Neuhauser
# dev@archonet.com / 2005-07-13 12:57:31 +0100:
 Roman Neuhauser wrote:
 Why does the planner want to crawl the table that has 5M rows instead of 
 the one
 with 176k rows? Both tables are freshly vacuum-full-analyzed.
 
 Because you don't have an index on base for the files table.
 
I added one, ran vacuum full analyze fix.files, and:

callrec32=# \d fix.files
  Table fix.files
 Column |  Type  | Modifiers
++---
 dir| character varying(255) |
 base   | character varying(255) |
Indexes:
base_storename_idx btree (base, dir)::text || '/'::text) || 
(base)::text)))
ff_baseonly_idx btree (base)
ff_storename_idx btree (dir)::text || '/'::text) || 
(base)::text)))

callrec32=# explain select fd.base from fix.dups fd join fix.files ff using 
(base);
 QUERY PLAN

 Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
   Hash Cond: ((outer.base)::text = (inner.base)::text)
   -  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458 width=41)
   -  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
 -  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160 width=44)
(5 rows)

Which is exactly what I expected. Using left prefix of a multicolumn
index normally works just fine, thank you.

http://www.postgresql.org/docs/7.4/static/indexes-multicolumn.html:

The query planner can use a multicolumn index for queries that involve
^^
the leftmost column in the index definition plus any number of columns
^^^
listed to the right of it, without a gap. For example, an index on (a,
b, c) can be used in queries involving all of a, b, and c, or in queries
involving both a and b, or in queries involving only a

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Richard Huxton

Roman Neuhauser wrote:

Because you don't have an index on base for the files table.


 
I added one, ran vacuum full analyze fix.files, and:


callrec32=# \d fix.files
  Table fix.files
 Column |  Type  | Modifiers
++---
 dir| character varying(255) |
 base   | character varying(255) |
Indexes:
base_storename_idx btree (base, dir)::text || '/'::text) || 
(base)::text)))
ff_baseonly_idx btree (base)
ff_storename_idx btree (dir)::text || '/'::text) || 
(base)::text)))

callrec32=# explain select fd.base from fix.dups fd join fix.files ff using 
(base);
 QUERY PLAN

 Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
   Hash Cond: ((outer.base)::text = (inner.base)::text)
   -  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458 width=41)
   -  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
 -  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160 width=44)
(5 rows)

Which is exactly what I expected. Using left prefix of a multicolumn
index normally works just fine, thank you.


Couldn't figure out what you meant here - had to go back and re-read 
your index definitions. Sorry - missed the (base, ...) on the front of 
base_storename_idx.


What happens to the plan if you SET enable_seqscan=false; first? It's 
presumably getting the row-estimate right, so unless there's terrible 
correlation on base in the files table I can only assume it's getting 
the cost estimates horribly wrong.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Roman Neuhauser
# dev@archonet.com / 2005-07-13 14:09:34 +0100:
 Roman Neuhauser wrote:
 callrec32=# \d fix.files
   Table fix.files
  Column |  Type  | Modifiers
 ++---
  dir| character varying(255) |
  base   | character varying(255) |
 Indexes:
 base_storename_idx btree (base, dir)::text || '/'::text) || 
 (base)::text)))
 ff_baseonly_idx btree (base)
 ff_storename_idx btree (dir)::text || '/'::text) || 
 (base)::text)))
 
 callrec32=# explain select fd.base from fix.dups fd join fix.files ff 
 using (base);
  QUERY PLAN
 
  
  Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
Hash Cond: ((outer.base)::text = (inner.base)::text)
-  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458 
width=41)
-  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
  -  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160 
  width=44)
 (5 rows)

 What happens to the plan if you SET enable_seqscan=false; first? It's 
 presumably getting the row-estimate right, so unless there's terrible 
 correlation on base in the files table I can only assume it's getting 
 the cost estimates horribly wrong.

callrec32=# SET enable_seqscan=false;
SET
callrec32=# explain select fd.base from fix.dups fd join fix.files ff using 
(base);
 QUERY PLAN 

 Nested Loop  (cost=0.00..1066990.93 rows=176161 width=44)
   -  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 
rows=176160 width=44)
   -  Index Scan using ff_baseonly_idx on files ff  (cost=0.00..6.01 rows=1 
width=41)
 Index Cond: ((outer.base)::text = (ff.base)::text)
(4 rows)

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Roman Neuhauser
# [EMAIL PROTECTED] / 2005-07-13 15:58:09 +0200:
 # dev@archonet.com / 2005-07-13 14:09:34 +0100:
  Roman Neuhauser wrote:
  callrec32=# \d fix.files
Table fix.files
   Column |  Type  | Modifiers
  ++---
   dir| character varying(255) |
   base   | character varying(255) |
  Indexes:
  base_storename_idx btree (base, dir)::text || '/'::text) || 
  (base)::text)))
  ff_baseonly_idx btree (base)
  ff_storename_idx btree (dir)::text || '/'::text) || 
  (base)::text)))
  
  callrec32=# explain select fd.base from fix.dups fd join fix.files ff 
  using (base);
   QUERY PLAN
  
   
   Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
 Hash Cond: ((outer.base)::text = (inner.base)::text)
 -  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458 
 width=41)
 -  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
   -  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160 
   width=44)
  (5 rows)
 
  What happens to the plan if you SET enable_seqscan=false; first? It's 
  presumably getting the row-estimate right, so unless there's terrible 
  correlation on base in the files table I can only assume it's getting 
  the cost estimates horribly wrong.
 
 callrec32=# SET enable_seqscan=false;
 SET
 callrec32=# explain select fd.base from fix.dups fd join fix.files ff using 
 (base);
  QUERY PLAN   
   
 
  Nested Loop  (cost=0.00..1066990.93 rows=176161 width=44)
-  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 
 rows=176160 width=44)
-  Index Scan using ff_baseonly_idx on files ff  (cost=0.00..6.01 rows=1 
 width=41)
  Index Cond: ((outer.base)::text = (ff.base)::text)
 (4 rows)

BTW, this query or its equivalent is what I'm really after:

callrec32=# explain select c1.storename from fix.dups fd join calls
c1 on (fd.base = basename(c1.storename) and c1.iscouple = '1') where
not exists (select 1 from fix.files ff where c1.storename = ff.dir
|| '/' || ff.base);
   QUERY PLAN   
  


 Hash Join  (cost=7474.26..23127970.91 rows=2354719 width=60)
   Hash Cond: ((basename(outer.storename))::text = (inner.base)::text)
   -  Index Scan using calls2_iscouple_idx on calls c1  
(cost=0.00..22982439.69 rows=2354719 width=60)
 Filter: ((iscouple = 1::smallint) AND (NOT (subplan)))
 SubPlan
   -  Index Scan using ff_storename_idx on files ff  
(cost=0.00..88570.16 rows=26393 width=0)
 Index Cond: (($0)::text = (((dir)::text || '/'::text) || 
(base)::text))
   -  Hash  (cost=5570.86..5570.86 rows=176160 width=44)
 -  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 
rows=176160 width=44)
(9 rows)

calls has (among others):

calls2_basename_storename_idx btree (basename(storename), storename) 
WHERE (iscouple = (1)::smallint)
calls2_storename_idx btree (storename) WHERE (iscouple = (1)::smallint)
calls2_iscouple_idx btree (id) WHERE (iscouple = (1)::smallint)

WHy does it use the calls2_iscouple_idx index when calls.id isn't
used anywhere in the query? I would guess that calls2_storename_idx
would actually be more useful.

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Tom Lane
Richard Huxton dev@archonet.com writes:
 What happens to the plan if you SET enable_seqscan=false; first? It's 
 presumably getting the row-estimate right, so unless there's terrible 
 correlation on base in the files table I can only assume it's getting 
 the cost estimates horribly wrong.

I think you'll find that the results suck ;-).  It looks to me that the
planner is making exactly the right choice here.  The only plausible
alternative is a nestloop with inner indexscan on files, which would
imply 176160 separate index probes into files, which is unlikely to
win compared to one seqscan.  (In the aggregate the index probes are
likely to end up touching every page of files anyway --- you would
need a much larger files table before this stopped being true.)

If you want to compare the nestloop plan to test this theory, turning
off enable_hashjoin and (if necessary) enable_mergejoin would be the
better way to get it.  But let's see EXPLAIN ANALYZE results for both
cases, not just EXPLAIN.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] 7.4.7: strange planner decision

2005-07-13 Thread Richard Huxton

Roman Neuhauser wrote:

# dev@archonet.com / 2005-07-13 14:09:34 +0100:


Roman Neuhauser wrote:


  callrec32=# \d fix.files
Table fix.files
   Column |  Type  | Modifiers
  ++---
   dir| character varying(255) |
   base   | character varying(255) |
  Indexes:
  base_storename_idx btree (base, dir)::text || '/'::text) || 
  (base)::text)))

  ff_baseonly_idx btree (base)
  ff_storename_idx btree (dir)::text || '/'::text) || 
  (base)::text)))


  callrec32=# explain select fd.base from fix.dups fd join fix.files ff 
  using (base);

   QUERY PLAN
  
   Hash Join  (cost=5340.00..292675.06 rows=176161 width=44)
 Hash Cond: ((outer.base)::text = (inner.base)::text)
 -  Seq Scan on files ff  (cost=0.00..117301.58 rows=5278458 
 width=41)

 -  Hash  (cost=3436.60..3436.60 rows=176160 width=44)
   -  Seq Scan on dups fd  (cost=0.00..3436.60 rows=176160 
   width=44)

  (5 rows)



What happens to the plan if you SET enable_seqscan=false; first? It's 
presumably getting the row-estimate right, so unless there's terrible 
correlation on base in the files table I can only assume it's getting 
the cost estimates horribly wrong.



callrec32=# SET enable_seqscan=false;
SET
callrec32=# explain select fd.base from fix.dups fd join fix.files ff using 
(base);
 QUERY PLAN 


 Nested Loop  (cost=0.00..1066990.93 rows=176161 width=44)
   -  Index Scan using dups_base_key on dups fd  (cost=0.00..5570.86 
rows=176160 width=44)
   -  Index Scan using ff_baseonly_idx on files ff  (cost=0.00..6.01 rows=1 
width=41)
 Index Cond: ((outer.base)::text = (ff.base)::text)


OK - so it thinks the cost of this plan will be about 1 million, whereas 
the old plan was 290 thousand. The question is - why?

What are your planner settings? Ch 16.4.4.2 here
http://www.postgresql.org/docs/7.4/static/runtime-config.html#RUNTIME-CONFIG-QUERY

I'm guessing something to do with cpu_index_tuple_cost or random_page_cost.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match