Re: [HACKERS] Proposed Query Planner TODO items

2004-11-04 Thread Tatsuo Ishii
  Hi Tatsuo,
  
  I've made a new release:
  http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
  
  Let me know if there are any problems.
 
 Thanks!

Just for quick note, it seems query 19 takes forever. Have you
successfully run Q19?
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


DBT-3 v1.5 Q19 (Re: [HACKERS] Proposed Query Planner TODO items)

2004-11-04 Thread Tatsuo Ishii
   Hi Tatsuo,
   
   I've made a new release:
 http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
   
   Let me know if there are any problems.
  
  Thanks!
 
 Just for quick note, it seems query 19 takes forever. Have you
 successfully run Q19?

Here is the more detailed info. The query was not finished within 3
days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
Linux. PostgreSQL is 7.4.5 with default postgresql.conf. An explain
output is attatched.
--
Tatsuo Ishii
   
   
   
   
   
   
   
   
   
   
   
 
  !
   
   
   
   
   
   
   
   
   
   
   
 
  !
   
   
   
   
   
   
   
   
   
   
   
 
  !
   
   
   
   
   
   
   
   
   
   
   
 
  !
   
   
   
   
   
  

Re: DBT-3 v1.5 Q19 (Re: [HACKERS] Proposed Query Planner TODO items)

2004-11-04 Thread Tom Lane
Tatsuo Ishii [EMAIL PROTECTED] writes:
 Just for quick note, it seems query 19 takes forever. Have you
 successfully run Q19?

 Here is the more detailed info. The query was not finished within 3
 days and was canceled on a Dual Xeon 2.8GHz with 2.5GB RAM running
 Linux. PostgreSQL is 7.4.5 with default postgresql.conf.

7.4's planner is not able to do anything useful with the complicated
WHERE clause in Q19.  I believe I've improved that situation for 8.0.
A really slick solution will probably have to await the appearance of
bitmap indexes, though.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-28 Thread Mark Wong
Hi Tatsuo,

I've made a new release:
http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download

Let me know if there are any problems.

On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:
 Hi,
 
 Thanks for the info. Would you give me the tarball?


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-28 Thread Tatsuo Ishii
 Hi Tatsuo,
 
 I've made a new release:
   http://prdownloads.sourceforge.net/osdldbt/dbt3-v1.5.tar.gz?download
 
 Let me know if there are any problems.

Thanks!

 On Tue, Oct 26, 2004 at 12:44:49PM +0900, Tatsuo Ishii wrote:
  Hi,
  
  Thanks for the info. Would you give me the tarball?
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-25 Thread Reini Urban
Tatsuo Ishii schrieb:
I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
This looks like a simple ploticus one-liner.
like:
pl -png -o vbars.png -prefab vbars data=dbt3.data x=1 y=2 barwidth=line
see for example: http://ploticus.sourceforge.net/doc/prefab_vbars.html
or
http://phpwiki.sourceforge.net/phpwiki/PhpMemoryExhausted/Testresults
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-25 Thread Mark Wong
Hi Tatsuo,

Yes, I've been updating the dbt3 kit over the past several months.
The query time graph is a new feature.  It's available via BitKeeper
at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
well enough to make a v1.5 release yet.  If BitKeeper isn't something
you can use, I can make a preliminary tarball for you.

Mark

On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:
 Mark,
 
 I see nice graphs for each DBT3 query(for example,
 http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
 they do not come with normal dbt3-1.4 kit. How did you get them?
 Maybe you have slightly modified dbt3 kit?
 --
 Tatsuo Ishii
 
  On  6 Feb, To: [EMAIL PROTECTED] wrote:
   On  5 Jan, Tom Lane wrote:
   Josh Berkus [EMAIL PROTECTED] writes:
   2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES
   
   Summary: Currently, queries with complex or group criteria get devolved by 
   the planner into canonical and-or filters resulting in very poor execution on
   large data sets.   We should find better ways of dealing with these queries, 
   for example UNIONing.
   
   Description: While helping OSDL with their derivative TPC-R benchmark, we ran
   into a query (#19) which took several hours to complete on PostgreSQL.
  
  http://developer.osdl.org/markw/dbt3-pgsql/
  
  There's a short summary of the tests I ran over the weekend, with links
  to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
  looks like query #7 had the only significant improvement.  Oprofile data
  should be there too, if that'll help.  Let us know if there's anything
  else we can try for you.
  
  Mark
  
  ---(end of broadcast)---
  TIP 2: you can get off all lists at once with the unregister command
  (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
  

-- 
Mark Wong - - [EMAIL PROTECTED]
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436  (fax)
http://developer.osdl.org/markw/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-25 Thread Tatsuo Ishii
Hi,

Thanks for the info. Would you give me the tarball?
--
Tatsuo Ishii

 Hi Tatsuo,
 
 Yes, I've been updating the dbt3 kit over the past several months.
 The query time graph is a new feature.  It's available via BitKeeper
 at bk://developer.osdl.org:/var/bk/dbt3 but I haven't tested the kit
 well enough to make a v1.5 release yet.  If BitKeeper isn't something
 you can use, I can make a preliminary tarball for you.
 
 Mark
 
 On Mon, Oct 25, 2004 at 01:59:46PM +0900, Tatsuo Ishii wrote:
  Mark,
  
  I see nice graphs for each DBT3 query(for example,
  http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
  they do not come with normal dbt3-1.4 kit. How did you get them?
  Maybe you have slightly modified dbt3 kit?
  --
  Tatsuo Ishii
  
   On  6 Feb, To: [EMAIL PROTECTED] wrote:
On  5 Jan, Tom Lane wrote:
Josh Berkus [EMAIL PROTECTED] writes:
2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES

Summary: Currently, queries with complex or group criteria get devolved by 
the planner into canonical and-or filters resulting in very poor execution on
large data sets.   We should find better ways of dealing with these queries, 
for example UNIONing.

Description: While helping OSDL with their derivative TPC-R benchmark, we ran
into a query (#19) which took several hours to complete on PostgreSQL.
   
   http://developer.osdl.org/markw/dbt3-pgsql/
   
   There's a short summary of the tests I ran over the weekend, with links
   to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
   looks like query #7 had the only significant improvement.  Oprofile data
   should be there too, if that'll help.  Let us know if there's anything
   else we can try for you.
   
   Mark
   
   ---(end of broadcast)---
   TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
   
 
 -- 
 Mark Wong - - [EMAIL PROTECTED]
 Open Source Development Lab Inc - A non-profit corporation
 12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
 (503) 626-2455 x 32 (office)
 (503) 626-2436  (fax)
 http://developer.osdl.org/markw/
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposed Query Planner TODO items

2004-10-24 Thread Tatsuo Ishii
Mark,

I see nice graphs for each DBT3 query(for example,
http://developer.osdl.org/markw/dbt3-pgsql/42/q_time.png). It seems
they do not come with normal dbt3-1.4 kit. How did you get them?
Maybe you have slightly modified dbt3 kit?
--
Tatsuo Ishii

 On  6 Feb, To: [EMAIL PROTECTED] wrote:
  On  5 Jan, Tom Lane wrote:
  Josh Berkus [EMAIL PROTECTED] writes:
  2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES
  
  Summary: Currently, queries with complex or group criteria get devolved by 
  the planner into canonical and-or filters resulting in very poor execution on
  large data sets.   We should find better ways of dealing with these queries, 
  for example UNIONing.
  
  Description: While helping OSDL with their derivative TPC-R benchmark, we ran
  into a query (#19) which took several hours to complete on PostgreSQL.
 
 http://developer.osdl.org/markw/dbt3-pgsql/
 
 There's a short summary of the tests I ran over the weekend, with links
 to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
 looks like query #7 had the only significant improvement.  Oprofile data
 should be there too, if that'll help.  Let us know if there's anything
 else we can try for you.
 
 Mark
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2004-06-01 Thread markw
On 12 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 Ok, I have EXPLAIN ANALYZE results for both the power and throughput
 tests:
  http://developer.osdl.org/markw/dbt3-pgsql/
 
 Thanks.  I just looked at Q9 and Q21, since those are the slowest
 queries according to your chart.  (Are all the queries weighted the same
 for evaluation purposes, or are some more important than others?)
 
[snip]

 The estimate for the part/partsupp join is close enough (60K vs 90K
 rows), but why is it estimating 92 rows out of the join to lineitem when
 the true figure is 681518?  With a more accurate estimate the planner
 would probably have chosen different join methods above this point.
 
 Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
 ps_suppkey, and l_suppkey?
 
 It would also be interesting to see whether a better estimate emerges
 if you increase default_statistics_target (try 100 or so).

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.)  Pretty significant performance change.

Power:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.ps_suppkey.out

Throughput:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.ps_suppkey.out


Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2004-05-30 Thread Josh Berkus
Mark,

 It's run #60 and the links are towards the bottom of the page under the
 Run log data heading.  The results from the power test is
 power_query.result and thuput_qs1.result, etc. for each stream in
 the throughput test.

I'm confused.  Were you able to get the original-form query #19 to complete, 
or not?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Proposed Query Planner TODO items

2004-05-13 Thread markw
On  9 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 I'll see what I can do about the explain and explain analyze
 results.  I remember in the past that someone said it would be most
 interesting to execute the latter while the test while running, as
 opposed to before or after a test.  Should I do that here too?
 
 If possible, but I'd settle for a standalone result, so long as it's
 executed against the correct database contents (including pg_statistic
 settings).

Ok, I've found that the kit does capture explain results and I've
added a Query Plans links under the query time charts on each of the
pages.  Um, but I did notice a couple of problems.  It looks liks one of
the 22 queries is missing and they're not labeled.  I'll see about
getting that fixed.

Mark

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Proposed Query Planner TODO items

2004-05-13 Thread Josh Berkus
Mark,

 Ok, I've found that the kit does capture explain results and I've
 added a Query Plans links under the query time charts on each of the
 pages.  Um, but I did notice a couple of problems.  It looks liks one of
 the 22 queries is missing and they're not labeled.  I'll see about
 getting that fixed.

If #19 is missing it's because Oleg  I could not get it to complete.  That 
was also the query which we are most interested in testing.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-16 Thread markw
On 16 Feb, Dennis Haney wrote:
 [EMAIL PROTECTED] wrote:
 
On 12 Feb, Tom Lane wrote:
  

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.)  Pretty significant performance change.

  

 Why the filesystem change to ext2 at the same time?

I've been rotating filesystems occasionally.  Otherwise no specific
reason.

Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-16 Thread Dennis Haney
[EMAIL PROTECTED] wrote:

On 12 Feb, Tom Lane wrote:
 

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.)  Pretty significant performance change.
 

Why the filesystem change to ext2 at the same time?

Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.
 



--
Dennis
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-16 Thread markw
On 16 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 I ran a test with the CAST you recommended for Q4 over the weekend:
  http://developer.osdl.org/markw/dbt3-pgsql/68/
 But it didn't seem to have much of an affect on Q4, compared to run
 #66.  I'll still give the CVS tip a try.
 
 Hm.  Disappointing.  I can see from the EXPLAIN results that it is
 picking up the additional index constraint correctly in this run.
 That should have saved a good number of useless heap fetches.
 [ works with the numbers a little... ]  Actually, I guess it did:
 it looks like the time spent in the indexscan proper went down from
 44msec to 7msec.  The problem is that the bulk of the query time is
 actually going into the repeated EXISTS() sub-selects, and those didn't
 get any better.
 
 There are some other queries in the set that also have date limits of
 this kind, so I still think it's worth redoing a run with CVS tip to
 see if we pick up anything overall.  (You do have indexes created on
 all the date columns no?)
 
 There's probably no way to make Q4 fly without finding a way to optimize
 the EXISTS into an IN-join.  I'll put that on my to-do list ... in the
 meantime, if you feel like making a run to confirm that theory, try
 modifying Q4 to replace
 
   and exists ( select * from lineitem
where l_orderkey = o_orderkey and l_commitdate  l_receiptdate )
 
 with
 
   and o_orderkey in ( select l_orderkey from lineitem
   where l_commitdate  l_receiptdate )
 
 I think that either 7.4 or CVS tip will do better with this variant,
 but it probably ought to be checked.

It looks like we have indexes on all of the date columns except
l_commitdate, which appears to be in Q4.

So I think I'll run against the CVS tip as is, again with an index on
l_commitdate, and then another test to confirm your theory.  Sound good?

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match



Re: [HACKERS] Proposed Query Planner TODO items

2004-02-16 Thread markw
On 16 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 It looks like we have indexes on all of the date columns except
 l_commitdate, which appears to be in Q4.
 
 So I think I'll run against the CVS tip as is, again with an index on
 l_commitdate, and then another test to confirm your theory.  Sound good?
 
 Sure, it's only cycles ;-).  I am not certain that an index on
 commitdate would help any, but it's worth trying.

http://developer.osdl.org/markw/dbt3-pgsql/70/

Those are results from a pull from CVS I did this morning.
I reverted Q4 (removed the CAST), but the extra WHERE constraints are
still in Q21.

Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org



Re: [HACKERS] Proposed Query Planner TODO items

2004-02-15 Thread Tom Lane
I wrote:
 I see what is going on to make Q4 slow, too.  It's this:
   where o_orderdate = date '1995-04-01' and o_orderdate  date '1995-04-01' + 
 interval '3 month'
 ...
 As of CVS tip the issue could be eliminated by introducing
 cross-data-type comparison operators between types date and timestamp
 without time zone, and then making these be members of the date index
 opclass.  I'm strongly tempted to do so ...

I have now done this, so if you care to re-sync with CVS tip you should
find that the queries using this sort of date constraint go faster.
(You do have indexes on all the date columns, no?)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-15 Thread Jenny Zhang
On Mon, 2004-02-09 at 16:53, Josh Berkus wrote:
 Jenny,
 
  For 19, we moved the common conditions out of the big ORs, for 20, we
  added distinct.  We can change the query back if the optimizer can
  handle it now.
 
 Well, we want to test if it can. 
Replace the file 19.sql under datagen/pgsql-queries with the attachment
should do it.

Jenny


-- @(#)19.sql	2.1.8.1
-- TPC-H/TPC-R Discounted Revenue Query (Q19)
-- Functional Query Definition
-- Approved February 1998
:b
:x
:o
select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = ':1'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity = :4 and l_quantity = :4+10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':2'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity = :5 and l_quantity = :5+10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':3'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity = :6 and l_quantity = :6+10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);
:e

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-13 Thread Dennis Haney
You are refering to:

@inproceedings{ hellerstein93predicate,
   author = Joseph M. Hellerstein and Michael Stonebraker,
   title = Predicate migration: optimizing queries with expensive 
predicates,
   pages = 267--276,
   year = 1993,
   abstract = The traditional focus of relational query optimization 
schemes has been on the choice of join methods and join orders. 
Restrictions have typically been handled in query optimizers by 
predicate pushdown rules, which apply restrictions in some random 
order before as many joins as possible. These rules work under the 
assumption that restriction is essentially a zero-time operation. 
However, today's extensible and object-oriented database systems allow 
users to define time-consuming functions,...,
   url = citeseer.nj.nec.com/article/hellerstein92predicate.html }

Tom Lane wrote:

I think the key issue here is that the two EXISTS tests depend only on
l1.l_orderkey and l1.l_suppkey of the outer query.  Therefore they get
pushed down in the plan tree to be evaluated during the initial scan
of l1.  This is normally a good heuristic choice, but because the EXISTS
tests are relatively expensive, that ends up forcing the planner to use
a nestloop-with-inner-index-scan join between nation/supplier and l1.
Any other join technique will involve a seqscan of l1 causing the EXISTS
tests to be evaluated at every row of lineitem; the planner correctly
ranks those alternatives as even worse than this.
The trouble is that the nestloop is hugely expensive: you can see that
the repeated indexscans on l1 take up 1912.454*760 - 0.066*277343 -
0.812*287821 or 1201449.750 msec, about 80% of the total.
It seems that the correct way to plan this query would require
postponing evaluation of the EXISTS clauses.  If those were further up
the tree, the planner would have chosen a merge or hash join at this
step, which would probably take a tenth as much time.  The cost to run
the EXISTS clauses themselves wouldn't change; they'd not be executed
any more frequently in this case.
I recall seeing traces in the code of logic that would attempt to delay
the evaluation of expensive WHERE tests, but that's been gone since
Berkeley days.  Perhaps we should think about resurrecting it, or at
least putting in some kind of heuristic to try to cope better with this
case.
It would be interesting to see what the runtime looks like if you add
the following to the WHERE clauses of both inner EXISTS:
 AND s_nationkey = n_nationkey AND o_orderkey = l1.l_orderkey
This would not change the results AFAICS, but it would force the
evaluation of the EXISTS clauses up to the top level of the outer plan
(since the planner would then see 'em as join constraints).
			regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 



--
Dennis
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-13 Thread Tom Lane
Dennis Haney [EMAIL PROTECTED] writes:
 You are refering to:
 @inproceedings{ hellerstein93predicate,
 author = Joseph M. Hellerstein and Michael Stonebraker,
 title = Predicate migration: optimizing queries with expensive 
 predicates,

Yup, I sure am.  This is the same thesis referred to here:
http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php

We may need to put some of it back ;-)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-13 Thread Tom Lane
[EMAIL PROTECTED] writes:
 http://developer.osdl.org/markw/dbt3-pgsql/66/

 There's a run with a modified Q21.  Made a huge improvement in Q21.

Okay, looks like we know what we need to attack to solve Q21... actually
solving it will be a tad harder ;-) but we understand where the problem is.

I see what is going on to make Q4 slow, too.  It's this:

  where o_orderdate = date '1995-04-01' and o_orderdate  date '1995-04-01' + 
interval '3 month'

(o_orderdate is of type date, unsurprisingly).  This produces

 -  Index Scan using i_o_orderdate on orders  (cost=0.00..2603496.38 
rows=253677 width=19) (actual time=45.908..202483.023 rows=104083 loops=1)
   Index Cond: (o_orderdate = '1995-04-01'::date)
   Filter: (((o_orderdate)::timestamp without time zone  '1995-07-01 
00:00:00'::timestamp without time zone) AND (subplan))

that is, the lower bound is recognized as an indexscan constraint,
but the upper bound isn't because of the datatype mismatch.  So we end
up fetching the whole table up through its ending date.

Up to now, all we could do about this sort of issue was to suggest that
people cast to eliminate the datatype mismatch:

  where o_orderdate = date '1995-04-01' and o_orderdate  CAST(date '1995-04-01' + 
interval '3 month' AS date)

but I dunno whether that's an allowed query modification under the TPC-H
rules.

As of CVS tip the issue could be eliminated by introducing
cross-data-type comparison operators between types date and timestamp
without time zone, and then making these be members of the date index
opclass.  I'm strongly tempted to do so ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-12 Thread markw
Ok, I have EXPLAIN ANALYZE results for both the power and throughput
tests:
http://developer.osdl.org/markw/dbt3-pgsql/

It's run #60 and the links are towards the bottom of the page under the
Run log data heading.  The results from the power test is
power_query.result and thuput_qs1.result, etc. for each stream in
the throughput test.

Mark

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-12 Thread markw
On 12 Feb, Josh Berkus wrote:
 Mark,
 
 It's run #60 and the links are towards the bottom of the page under the
 Run log data heading.  The results from the power test is
 power_query.result and thuput_qs1.result, etc. for each stream in
 the throughput test.
 
 I'm confused.  Were you able to get the original-form query #19 to complete, 
 or not?

Oh sorry, I completely forgot that Q19 the whole purpose of this.  So
#60 doesn't have the right Q19.  I'll run with the one you want now.

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-12 Thread Josh Berkus
Mark,

 Oh sorry, I completely forgot that Q19 the whole purpose of this.  So
 #60 doesn't have the right Q19.  I'll run with the one you want now.

Thanks!  And the original, not the fixed, Q19 if you please.   It's the 
original that wouldn't finish on Postgres 7.3.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-12 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Ok, I have EXPLAIN ANALYZE results for both the power and throughput
 tests:
   http://developer.osdl.org/markw/dbt3-pgsql/

Thanks.  I just looked at Q9 and Q21, since those are the slowest
queries according to your chart.  (Are all the queries weighted the same
for evaluation purposes, or are some more important than others?)

The problem with Q9 seems to be an estimation failure:

-  Nested Loop  (cost=0.00..437591.67 rows=92 width=74) (actual 
time=12.030..1603892.783 rows=681518 loops=1)
  -  Nested Loop  (cost=0.00..65364.57 rows=61720 width=43) (actual 
time=0.326..5667.573 rows=90676 loops=1)
-  Seq Scan on part  (cost=0.00..15733.27 rows=15992 width=11) 
(actual time=0.183..1539.306 rows=22669 loops=1)
  Filter: ((p_name)::text ~~ '%hot%'::text)
-  Index Scan using i_ps_partkey on partsupp  (cost=0.00..3.05 rows=4 
width=32) (actual time=0.119..0.151 rows=4 loops=22669)
  Index Cond: (outer.p_partkey = partsupp.ps_partkey)
  -  Index Scan using i_l_suppkey_partkey on lineitem  (cost=0.00..6.02 
rows=1 width=64) (actual time=2.183..17.564 rows=8 loops=90676)
Index Cond: ((outer.p_partkey = lineitem.l_partkey) AND 
(outer.ps_suppkey = lineitem.l_suppkey))

The estimate for the part/partsupp join is close enough (60K vs 90K
rows), but why is it estimating 92 rows out of the join to lineitem when
the true figure is 681518?  With a more accurate estimate the planner
would probably have chosen different join methods above this point.

Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
ps_suppkey, and l_suppkey?

It would also be interesting to see whether a better estimate emerges
if you increase default_statistics_target (try 100 or so).

Q21 is a more interesting case:

 EXPLAIN ANALYZE
 select s_name, count(*) as numwait
 from supplier, lineitem l1, orders, nation
 where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' 
and l1.l_receiptdate  l1.l_commitdate
 and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and 
l2.l_suppkey  l1.l_suppkey )
 and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and 
l3.l_suppkey  l1.l_suppkey and l3.l_receiptdate  l3.l_commitdate )
 and s_nationkey = n_nationkey and n_name = 'MOROCCO'
 group by s_name
 order by numwait desc, s_name
 LIMIT 100;
  QUERY 
PLAN   
---
 Limit  (cost=2984783.51..2984783.76 rows=100 width=29) (actual 
time=1490860.249..1490860.460 rows=100 loops=1)
   -  Sort  (cost=2984783.51..2984831.91 rows=19361 width=29) (actual 
time=1490860.244..1490860.320 rows=100 loops=1)
 Sort Key: count(*), supplier.s_name
 -  HashAggregate  (cost=2983356.52..2983404.92 rows=19361 width=29) (actual 
time=1490853.802..1490856.472 rows=760 loops=1)
   -  Nested Loop  (cost=0.00..2983259.72 rows=19361 width=29) (actual 
time=350.991..1490777.523 rows=7471 loops=1)
 -  Nested Loop  (cost=0.00..2862119.72 rows=4 width=40) 
(actual time=350.805..1453771.752 rows=15369 loops=1)
   -  Nested Loop  (cost=0.00..994.08 rows=802 width=40) 
(actual time=0.152..187.510 rows=760 loops=1)
 Join Filter: (inner.s_nationkey = 
outer.n_nationkey)
 -  Seq Scan on nation  (cost=0.00..1.31 rows=1 
width=9) (actual time=0.088..0.113 rows=1 loops=1)
   Filter: (n_name = 'MOROCCO'::bpchar)
 -  Seq Scan on supplier  (cost=0.00..742.34 
rows=20034 width=49) (actual time=0.010..136.902 rows=2 loops=1)
   -  Index Scan using i_l_suppkey on lineitem l1  
(cost=0.00..3566.81 rows=54 width=21) (actual time=87.928..1912.454 rows=20 loops=760)
 Index Cond: (outer.s_suppkey = l1.l_suppkey)
 Filter: ((l_receiptdate  l_commitdate) AND (subplan) 
AND (NOT (subplan)))
 SubPlan
   -  Index Scan using i_l_orderkey on lineitem l3  
(cost=0.00..3.13 rows=3 width=178) (actual time=0.066..0.066 rows=1 loops=277343)
 Index Cond: (l_orderkey = $0)
 Filter: ((l_suppkey  $1) AND (l_receiptdate 
 l_commitdate))
   -  Index Scan using i_l_orderkey on lineitem l2  
(cost=0.00..3.11 rows=7 width=178) (actual time=0.812..0.812 rows=1 loops=287821)
 Index Cond: 

Re: [HACKERS] Proposed Query Planner TODO items

2004-02-12 Thread markw
On 12 Feb, Josh Berkus wrote:
 Mark,
 
 Oh sorry, I completely forgot that Q19 the whole purpose of this.  So
 #60 doesn't have the right Q19.  I'll run with the one you want now.
 
 Thanks!  And the original, not the fixed, Q19 if you please.   It's the 
 original that wouldn't finish on Postgres 7.3.

Josh,

http://developer.osdl.org/markw/dbt3-pgsql/

Check out #61.  I replaced the Q19 template with the one Jenny sent out.
Looks like it ran just fine.  This run also has the EXPLAIN ANALYZE
results, but none of the other things Tom has asked for yet.

Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread markw
On  6 Feb, To: [EMAIL PROTECTED] wrote:
 On  5 Jan, Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES
 
 Summary: Currently, queries with complex or group criteria get devolved by 
 the planner into canonical and-or filters resulting in very poor execution on
 large data sets.   We should find better ways of dealing with these queries, 
 for example UNIONing.
 
 Description: While helping OSDL with their derivative TPC-R benchmark, we ran
 into a query (#19) which took several hours to complete on PostgreSQL.

http://developer.osdl.org/markw/dbt3-pgsql/

There's a short summary of the tests I ran over the weekend, with links
to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
looks like query #7 had the only significant improvement.  Oprofile data
should be there too, if that'll help.  Let us know if there's anything
else we can try for you.

Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread Tom Lane
[EMAIL PROTECTED] writes:
 I'll see what I can do about the explain and explain analyze
 results.  I remember in the past that someone said it would be most
 interesting to execute the latter while the test while running, as
 opposed to before or after a test.  Should I do that here too?

If possible, but I'd settle for a standalone result, so long as it's
executed against the correct database contents (including pg_statistic
settings).

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread markw
On  9 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 http://developer.osdl.org/markw/dbt3-pgsql/
 
 There's a short summary of the tests I ran over the weekend, with links
 to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
 looks like query #7 had the only significant improvement.  Oprofile data
 should be there too, if that'll help.  Let us know if there's anything
 else we can try for you.
 
 I couldn't figure out anything at all from that, possibly because many
 of the links are dead, eg the task descriptions.  I don't even see
 where you see the time for query #7.
 
 What would be interesting from my perspective is explain results (or
 even better, explain analyze results) for the problem queries.  Any
 chance of extracting such a thing?

Sorry about the task links, I think I've got that corrected.

I'll see what I can do about the explain and explain analyze
results.  I remember in the past that someone said it would be most
interesting to execute the latter while the test while running, as
opposed to before or after a test.  Should I do that here too?

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread Tom Lane
[EMAIL PROTECTED] writes:
 http://developer.osdl.org/markw/dbt3-pgsql/

 There's a short summary of the tests I ran over the weekend, with links
 to detailed retults.  Comparing runs 43 (7.4) and 52 (7.5devel), it
 looks like query #7 had the only significant improvement.  Oprofile data
 should be there too, if that'll help.  Let us know if there's anything
 else we can try for you.

I couldn't figure out anything at all from that, possibly because many
of the links are dead, eg the task descriptions.  I don't even see
where you see the time for query #7.

What would be interesting from my perspective is explain results (or
even better, explain analyze results) for the problem queries.  Any
chance of extracting such a thing?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread markw
On  9 Feb, Josh Berkus wrote:
 Mark,
 
 Ok, I've found that the kit does capture explain results and I've
 added a Query Plans links under the query time charts on each of the
 pages.  Um, but I did notice a couple of problems.  It looks liks one of
 the 22 queries is missing and they're not labeled.  I'll see about
 getting that fixed.
 
 If #19 is missing it's because Oleg  I could not get it to complete.  That 
 was also the query which we are most interested in testing.

Oh, it's probably because we've altered Q19 and Q20.  I'm still not all
that familiar with this kit, so I'm learning as we go.  So we need to
change it back to make it worthwhile for you.

Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 If #19 is missing it's because Oleg  I could not get it to complete.  That 
 was also the query which we are most interested in testing.

Q19 doesn't seem to be particularly slow in either the 7.4 or 7.5 tests
--- there are many others with longer runtimes.  I speculate that what
is actually being run here is a modified Q19 query with the merge join
condition pulled out by hand.  The CVS-tip planner should be able to do
that for itself, though, and obtain essentially this same performance
with the per-spec query.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-09 Thread Josh Berkus
Jenny,

 For 19, we moved the common conditions out of the big ORs, for 20, we
 added distinct.  We can change the query back if the optimizer can
 handle it now.

Well, we want to test if it can. 

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-06 Thread markw
On  5 Jan, Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES
 
 Summary: Currently, queries with complex or group criteria get devolved by 
 the planner into canonical and-or filters resulting in very poor execution on
 large data sets.   We should find better ways of dealing with these queries, 
 for example UNIONing.
 
 Description: While helping OSDL with their derivative TPC-R benchmark, we ran
 into a query (#19) which took several hours to complete on PostgreSQL.
 
 I've made some progress on this over the last week or two.  Would it be
 possible to retry that benchmark with CVS tip?
 
   regards, tom lane

Sorry it's taking so long.  I tried to take a export from CVS today and
the database appears not to be able to connect to the postmaster when I
attempt to create the database.  Let me know if getting a trace of
anything will help, if you guys already aren't already aware of the
problem.

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-06 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Sorry it's taking so long.  I tried to take a export from CVS today and
 the database appears not to be able to connect to the postmaster when I
 attempt to create the database.  Let me know if getting a trace of
 anything will help, if you guys already aren't already aware of the
 problem.

CVS tip is not broken to my knowledge.  Details please?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Proposed Query Planner TODO items

2004-02-06 Thread markw
On  6 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 Sorry it's taking so long.  I tried to take a export from CVS today and
 the database appears not to be able to connect to the postmaster when I
 attempt to create the database.  Let me know if getting a trace of
 anything will help, if you guys already aren't already aware of the
 problem.
 
 CVS tip is not broken to my knowledge.  Details please?

I ran this:

$ strace -o /tmp/initdb-7.5.out initdb -D /opt/pgdb/dbt2
The files belonging to this database system will be owned by user markw.
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /opt/pgdb/dbt2 ... ok
creating directory /opt/pgdb/dbt2/global ... ok
creating directory /opt/pgdb/dbt2/pg_xlog ... ok
creating directory /opt/pgdb/dbt2/pg_clog ... ok
creating directory /opt/pgdb/dbt2/base ... ok
creating directory /opt/pgdb/dbt2/base/1 ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR:  relnatts disagrees 
with indnatts for index 16601
initdb: child process exited with exit code 1
initdb: failed
initdb: removing data directory /opt/pgdb/dbt2


I've never seen this relnatts and indnatts disagreements message before.
I'll attach a compressed strace.

Thanks,
Mark

initdb-7.5.out.gz
Description: Binary data

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-06 Thread Tom Lane
[EMAIL PROTECTED] writes:
 creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR:  relnatts disagrees 
 with indnatts for index 16601

Wow, that's a bizarre one.  Are you sure you did a clean rebuild?
I usually like to do make distclean before or after cvs update;
it tends to save me a lot of wasted time chasing build inconsistencies.
Which is what I suspect this is.

FWIW, my last CVS pull was yesterday about 15:00 EST, and it works fine.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Proposed Query Planner TODO items

2004-02-06 Thread markw
On  6 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 creating template1 database in /opt/pgdb/dbt2/base/1 ... ERROR:  relnatts disagrees 
 with indnatts for index 16601
 
 Wow, that's a bizarre one.  Are you sure you did a clean rebuild?
 I usually like to do make distclean before or after cvs update;
 it tends to save me a lot of wasted time chasing build inconsistencies.
 Which is what I suspect this is.
 
 FWIW, my last CVS pull was yesterday about 15:00 EST, and it works fine.

Well, that make distclean did the trick.  I actually did an export this
morning, not a checkout, but not like that should matter.  Ok, will
hopefully get back with results soon.

Thanks,
Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Proposed Query Planner TODO items

2004-01-06 Thread Josh Berkus
Tom,

 I've made some progress on this over the last week or two.  Would it be
 possible to retry that benchmark with CVS tip?

Yes!   I'll just need some time to get my laptop set up for running it.   My 
server is, alas, in storage due to me  being between offices.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Proposed Query Planner TODO items

2004-01-05 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES

 Summary: Currently, queries with complex or group criteria get devolved by 
 the planner into canonical and-or filters resulting in very poor execution on
 large data sets.   We should find better ways of dealing with these queries, 
 for example UNIONing.

 Description: While helping OSDL with their derivative TPC-R benchmark, we ran
 into a query (#19) which took several hours to complete on PostgreSQL.

I've made some progress on this over the last week or two.  Would it be
possible to retry that benchmark with CVS tip?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2004-01-05 Thread markw
On  5 Jan, Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 2) DEVELOP BETTER PLANS FOR OR GROUP QUERIES
 
 Summary: Currently, queries with complex or group criteria get devolved by 
 the planner into canonical and-or filters resulting in very poor execution on
 large data sets.   We should find better ways of dealing with these queries, 
 for example UNIONing.
 
 Description: While helping OSDL with their derivative TPC-R benchmark, we ran
 into a query (#19) which took several hours to complete on PostgreSQL.
 
 I've made some progress on this over the last week or two.  Would it be
 possible to retry that benchmark with CVS tip?
 
Yeah, no problem.  We'll pull the code from CVS and give it a try.

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-19 Thread Josh Berkus
Tom,

 Could we see the actual present query plans for both the TPC-R query
 and the UNION version?  (I'll settle for explain on the slow
 version, but explain analyze on the other, please.)

I'm not going to be able to set this up.   I just had to put my server into 
cold storage due to dismantling my office, and running the TPC stuff on my 
laptop is a joke.

I'll contact the OSDL folks to see if they can run it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-19 Thread Joshua D. Drake

I'm not going to be able to set this up.   I just had to put my server into 
cold storage due to dismantling my office, and running the TPC stuff on my 
laptop is a joke.

I'll contact the OSDL folks to see if they can run it.

 

We can... depending on what you need for a server.

J





--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-09 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Summary: Currently, queries with complex or group criteria get devolved by 
 the planner into canonical and-or filters resulting in very poor execution on
 large data sets.   We should find better ways of dealing with these queries, 
 for example UNIONing.

Could we see the actual present query plans for both the TPC-R query
and the UNION version?  (I'll settle for explain on the slow
version, but explain analyze on the other, please.)

In general I am suspicious of proposals to rewrite queries into UNION
equivalents, because the equivalent usually isn't exactly
equivalent, at least not without conditions that the planner can't
easily prove.  This proposal looks a lot like the KSQO optimization that
we put in and then took out again several years ago --- it also rewrote
queries into a UNION form, only the UNION didn't necessarily produce
identical results.

I am thinking that the guys who do this query fast are probably
extracting single-relation subsets of the big OR/AND clause, so that
they can do some filtering of the input tables before the join.  Our
existing planner would think that the OR/AND clause is only usable at
the join step, which is why it's seqscanning.  But if we pulled out
subsets, we could have for instance

WHERE t1.a = t2.a
AND (
( t1.c = x
  AND t1.f IN (m, n, o)
  AND t2.d = v
  AND t2.e BETWEEN j AND k
)
OR
( t1.c = y
  AND t1.f IN (n, o, p)
  AND t2.d = v
  AND t2.e BETWEEN k AND h
)
OR 
( t1.c = z
  AND t1.f IN (p, q)
  AND t2.d = w
  AND t2.e BETWEEN k AND h
)
)
AND ( t1.c = x OR t1.c = y OR t1.c = z )

which is redundant, but that last clause could enable an indexscan on t1.c.

However ... the planner has code in it already that should do something
close to that, so there may be something I am missing.  Again, could we
see EXPLAIN results?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-09 Thread Josh Berkus
Tom,

 In general I am suspicious of proposals to rewrite queries into UNION
 equivalents, because the equivalent usually isn't exactly
 equivalent, at least not without conditions that the planner can't
 easily prove.

As I said, I'm not sure that UNIONing the query is the solution, we just need 
something other than what the planner currently does, which does not 
complete.

Explains later today.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-05 Thread Josh Berkus
John,

  SELECT t1.a, t2.b
  FROM t1, t2
  WHERE t1.a = t2.a
  AND t1.c = x
AND t1.f IN (m, n, o)
AND t2.d = v
AND t2.e BETWEEN j AND k
  UNION ALL

 Shouldn't that be UNION instead of UNION ALL? You don't want
 duplicate rows, if i'm not mistaken.

Yes, you're correct; I copied UNION ALL from a test case which was not 
generic.  In general, one would want UNION.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Proposed Query Planner TODO items

2003-12-05 Thread Greg Stark

I know Oracle is capable of producing the UNION plan. but I don't know if
that's the only option. I'm curious what indexes the rewritten union-based
query used.

Josh Berkus [EMAIL PROTECTED] writes:

 SELECT t1.a, t2.b
 FROM t1, t2
 WHERE t1.a = t2.a
 AND (
   ( t1.c = x
 AND t1.f IN (m, n, o)
 AND t2.d = v
 AND t2.e BETWEEN j AND k
   )
   OR
   ( t1.c = y
 AND t1.f IN (n, o, p)
 AND t2.d = v
 AND t2.e BETWEEN k AND h
   )
   OR 
   ( t1.c = z
 AND t1.f IN (p, q)
 AND t2.d = w
 AND t2.e BETWEEN k AND h
   )
   )

In this case it seems like it might be possible to look for a covering set
that is guaranteed to include all the records and doesn't include any ORs. If
that covering set can be scanned quickly then the complex conditions could be
tested on the resulting records individually.

In this case it would be something like

select t1.a,t2.b from t1,t2 where t1.a = t2.a
   and (t1.c in (x,y,z)
and t1.f in (m,n,o,p,q)
and t2.d in (v,w)
and t2.e between min(j,k) and max(k,h)
   )
   and ( the above constraints...)

It seems like it would be a lot of work and only help in narrow cases though.


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster