[GENERAL] optimising UNION performance

2006-08-28 Thread Rafal Pietrak
Hi all,

Is there a way to speed up the query to my 'grand total' logfile,
constructed as a UNION of smaller (specialised) logfiles?

Access to log1/log2 is quick (If I'm reading ANALYSE log correctly, it's
c.a. 100ms each - and it feels like that, so presumebly I'm reading
ANALYSE just OK), but the UNION is quite slow (3.5 sec, which I can
confirm - it's the real response time of the server in that case)

Again, if I'm reading the ANALYSE correctly (and I'm not really sure of
that), it looks like the performence is hit by the sort performed by the
UNION. The sort is not really needed in this case and don't really know
why it's there (my comlog VIEW definition does not request one - or does
it implicitly?).

One point not shown in the attachment, is that 'brands' and 'clty' are
foreing references. Should foreign reference matter here?

Can someone give me a hint on how to write a VIEW, that returns
concatenation of both log-tables within a sum of their respective access
times (as of current, that would be c.a. 200ms)

The ANALYSE is taken from postgres v8.1.4 installed from *.deb on
Debian-sid, hosted by 800MHz/512MB Duron machine. It really strickes me,
that 20k+30k rows is returned in 3.5sec!

-- 
-R

 View public.comlog
  Column  |  Type   | Modifiers 
--+-+---
 opmode   | boolean | 
 date | date| 
 jfinito  | date| 
 readout  | numeric | 
 brand1   | integer | 
 brand2   | integer | 
 clty | integer | 
 opis | text| 
View definition:
 SELECT true AS opmode, date(c.tm) AS date, c.jfinito, c.readout, c.brand1, 
c.brand2, c.clty, c.opis
   FROM log1 c
UNION 
 SELECT false AS opmode, date(date_mi_interval(s.tm, '00:00:01'::interval)) AS 
date, NULL::unknown AS jfinito, s.state AS readout, s.brand1, s.brand2, 
s.clty, 'master' AS opis
   FROM log2 s;

QUERY PLAN  
  
--
 Seq Scan on log2  (cost=0.00..363.84 rows=20484 width=0) (actual 
time=0.022..55.879 rows=20484 loops=1)
 Total runtime: 81.673 ms
(2 rows)

 QUERY PLAN 


 Seq Scan on log1  (cost=0.00..698.16 rows=30916 width=0) (actual 
time=0.022..87.184 rows=30916 loops=1)
 Total runtime: 126.834 ms
(2 rows)

  QUERY PLAN
  
--
 Subquery Scan comlog  (cost=7137.30..8807.80 rows=51400 width=0) (actual 
time=2992.079..3507.783 rows=51400 loops=1)
   -  Unique  (cost=7137.30..8293.80 rows=51400 width=59) (actual 
time=2992.067..3359.512 rows=51400 loops=1)
 -  Sort  (cost=7137.30..7265.80 rows=51400 width=59) (actual 
time=2992.058..3123.836 rows=51400 loops=1)
   Sort Key: opmode, date, jfinito, readout, brand1, brand2, clty, 
opis
   -  Append  (cost=0.00..1755.71 rows=51400 width=59) (actual 
time=0.048..701.949 rows=51400 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..1084.61 
rows=30916 width=59) (actual time=0.045..332.276 rows=30916 loops=1)
   -  Seq Scan on log1 c  (cost=0.00..775.45 
rows=30916 width=59) (actual time=0.041..203.392 rows=30916 loops=1)
 -  Subquery Scan *SELECT* 2  (cost=0.00..671.10 
rows=20484 width=26) (actual time=0.088..227.255 rows=20484 loops=1)
   -  Seq Scan on log2 s  (cost=0.00..466.26 
rows=20484 width=26) (actual time=0.077..146.642 rows=20484 loops=1)
 Total runtime: 3589.929 ms
(10 rows)


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Peter Eisentraut
Am Montag, 28. August 2006 10:23 schrieb Rafal Pietrak:
 Is there a way to speed up the query to my 'grand total' logfile,
 constructed as a UNION of smaller (specialised) logfiles?

If it is sufficient for your purpose, you will find UNION ALL to be 
significantly faster.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Ragnar
On mán, 2006-08-28 at 10:23 +0200, Rafal Pietrak wrote:
 Hi all,
 
 Is there a way to speed up the query to my 'grand total' logfile,
 constructed as a UNION of smaller (specialised) logfiles?
 

I do not know if this is relevant to your case, but 
possibly you can use a UNION ALL instead of a UNION.

In many cases the UNION ALL gives petter performance,
as the unique step can be skipped, as well as a sort
needed by the unique.

gnari



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Rafal Pietrak
Thank you All for explanations. Looks loke that's what I was looking
for.

UNION ALL is quite satisfactory (830ms).

And yet, somwhere I loose c.a. 600ms (as compared to 120ms+80ms of each
respective 'raw' subquery) which as percentage seem signifficant.
Does anybody know where the processing goes now?

Currently, the ANALYSE looks like this:
QUERY
PLAN  
--
 Subquery Scan comlog  (cost=0.00..2269.71 rows=51400 width=0) (actual
time=0.053..755.649 rows=51400 loops=1)
   -  Append  (cost=0.00..1755.71 rows=51400 width=59) (actual
time=0.048..607.437 rows=51400 loops=1)
-  Subquery Scan *SELECT* 1  (cost=0.00..1084.61 rows=30916 width=59)
(actual time=0.046..278.802 rows=30916 loops=1)
-  Seq Scan on log1 c  (cost=0.00..775.45 rows=30916 width=59) (actual
time=0.042..170.193 rows=30916 loops=1)
-  Subquery Scan *SELECT* 2  (cost=0.00..671.10 rows=20484 width=26)
(actual time=0.055..200.223 rows=20484 loops=1)
-  Seq Scan on log2 s  (cost=0.00..466.26 rows=20484 width=26) (actual
time=0.044..127.301 rows=20484 loops=1)
Total runtime: 822.901 ms
(7 rows)
-

On Mon, 2006-08-28 at 09:11 +, Ragnar wrote:
 On mán, 2006-08-28 at 10:23 +0200, Rafal Pietrak wrote:
  Hi all,
  
  Is there a way to speed up the query to my 'grand total' logfile,
  constructed as a UNION of smaller (specialised) logfiles?
  
 
 I do not know if this is relevant to your case, but 
 possibly you can use a UNION ALL instead of a UNION.
 
 In many cases the UNION ALL gives petter performance,
 as the unique step can be skipped, as well as a sort
 needed by the unique.
 
 gnari
 
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
-- 
-R

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Alban Hertroys

Rafal Pietrak wrote:

Thank you All for explanations. Looks loke that's what I was looking
for.

UNION ALL is quite satisfactory (830ms).

And yet, somwhere I loose c.a. 600ms (as compared to 120ms+80ms of each
respective 'raw' subquery) which as percentage seem signifficant.
Does anybody know where the processing goes now?

Currently, the ANALYSE looks like this:
QUERY
PLAN  
--

 Subquery Scan comlog  (cost=0.00..2269.71 rows=51400 width=0) (actual
time=0.053..755.649 rows=51400 loops=1)
   -  Append  (cost=0.00..1755.71 rows=51400 width=59) (actual
time=0.048..607.437 rows=51400 loops=1)
-  Subquery Scan *SELECT* 1  (cost=0.00..1084.61 rows=30916 width=59)
(actual time=0.046..278.802 rows=30916 loops=1)
-  Seq Scan on log1 c  (cost=0.00..775.45 rows=30916 width=59) (actual
time=0.042..170.193 rows=30916 loops=1)
-  Subquery Scan *SELECT* 2  (cost=0.00..671.10 rows=20484 width=26)
(actual time=0.055..200.223 rows=20484 loops=1)
-  Seq Scan on log2 s  (cost=0.00..466.26 rows=20484 width=26) (actual
time=0.044..127.301 rows=20484 loops=1)
Total runtime: 822.901 ms
(7 rows)
-


Just to make sure: You do have an appropriate index over the tables in 
that UNION?


From experience, it seems that PostgreSQL chooses a sequential scan 
over unioned sets instead of an index scan - the details escape me, but 
there is a good reason for that. I'm sure it's not for performance 
reasons, though.


There have been some discussions about inheritance performance, which 
boils down to exactly this problem (inheritance basically is a UNION 
over all the tables involved). You may want to check the archives.


Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Rafal Pietrak
On Mon, 2006-08-28 at 13:04 +0200, Alban Hertroys wrote:
 Rafal Pietrak wrote:
  Total runtime: 822.901 ms
  (7 rows)
  -
 
 Just to make sure: You do have an appropriate index over the tables in 
 that UNION?

Well. The logfiles don't have their own indexes but make foreign key
references over brand1/brand2/clty columns. Unique constreins are on the
target tables.

But that particual ANALYSED query was: SELECT 1 FROM comlog. 

So, the use of seq-scan looks quite adequate to me (I haven't posted
results to avoid cuttering of my original query with too many details,
but the ANALYSE of SELECT * FROM comlog gives almost exactly the same
cost and time, and *that* is what I will actually be doing in the
application).

So: SELECT 1 on comlog costs 830ms and is done by two saq-scans.

but: SELECT 1 on log1 gives 120ms, and SELECT 1 on log2 gives 80ms. 

All three queries are executed as seq-scans which look OK, as I
intend to fetch *all* the rows.

And yet, there is this 600ms 'leak'.

Now, as I try to  read the ANALYSE output (which I'm not very proficient
at - just blindly comparing them): for queries of raw logs (log1/log2),
I can see just one server task: seq-scan for each respective query.
But when I look at ANALYSE output of comlog SELECT, I can see, that:
1. the seq-scans is more expensive here: 170ms and 120ms respectively.
Any reasons for that?
2. each scan has an additional job of: Subquery Scan *SELECT* 1 ...
which costs even more (280ms and 230ms respectively), although it's
purpose it not very clear to me.
3. only on top of that, there is an Append process, which looks cheap as
expected.

So I can risk an opinion, that I can seek the missing 600ms, I only
don't understand why it's there.

Or to put it the other way around: is there a way to write a UNION where
the 200ms to 800ms cost increase does not occure.

-R

  From experience, it seems that PostgreSQL chooses a sequential scan 
 over unioned sets instead of an index scan - the details escape me, but 
 there is a good reason for that. I'm sure it's not for performance 
 reasons, though.
 
 There have been some discussions about inheritance performance, which 
 boils down to exactly this problem (inheritance basically is a UNION 
 over all the tables involved). You may want to check the archives.
 
 Regards,
-- 
-R

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Alban Hertroys

Rafal Pietrak wrote:

Well. The logfiles don't have their own indexes but make foreign key
references over brand1/brand2/clty columns. Unique constreins are on the
target tables.


So there's no index on the logfiles then? (A foreign key constraint 
doesn't create an index). It doesn't seem like in your case an index 
would cause any benefit, just so you know.


It also means that your problem is different from mine; PostgreSQL using 
a seq-scan over a UNION (ALL) where indices were available over the 
parts (mind you, it would use the appropriate index on the seperate 
union parts).


But that particual ANALYSED query was: SELECT 1 FROM comlog. 


So, the use of seq-scan looks quite adequate to me (I haven't posted
results to avoid cuttering of my original query with too many details,
but the ANALYSE of SELECT * FROM comlog gives almost exactly the same
cost and time, and *that* is what I will actually be doing in the
application).


There's practically no difference between SELECT 1 FROM ... and SELECT * 
FROM ...; the only added costs (AFAIK) are for actually fetching the 
column values and such. Pretty cheap operations.



So: SELECT 1 on comlog costs 830ms and is done by two saq-scans.

but: SELECT 1 on log1 gives 120ms, and SELECT 1 on log2 gives 80ms. 


All three queries are executed as seq-scans which look OK, as I
intend to fetch *all* the rows.

And yet, there is this 600ms 'leak'.


...


But when I look at ANALYSE output of comlog SELECT, I can see, that:
1. the seq-scans is more expensive here: 170ms and 120ms respectively.
Any reasons for that?
2. each scan has an additional job of: Subquery Scan *SELECT* 1 ...
which costs even more (280ms and 230ms respectively), although it's
purpose it not very clear to me.


This is probably caused by using UNION as opposed to UNION ALL (as other 
people already mentioned).


To merge duplicate results (one from either subquery) the database 
sorts[1] the results. To do that, it needs to compare with other records 
- hence the extra subquery, and probably the added 50ms as well.


Regards,
--
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Rafal Pietrak
On Mon, 2006-08-28 at 14:50 +0200, Alban Hertroys wrote:
 Rafal Pietrak wrote:
 
  But when I look at ANALYSE output of comlog SELECT, I can see, that:
  1. the seq-scans is more expensive here: 170ms and 120ms respectively.
  Any reasons for that?
  2. each scan has an additional job of: Subquery Scan *SELECT* 1 ...
  which costs even more (280ms and 230ms respectively), although it's
  purpose it not very clear to me.
 
 This is probably caused by using UNION as opposed to UNION ALL (as other 
 people already mentioned).
 
 To merge duplicate results (one from either subquery) the database 
 sorts[1] the results. To do that, it needs to compare with other records 
 - hence the extra subquery, and probably the added 50ms as well.

No no no. 

The above 1. 2. 3. is read from UNION ALL analysis - the results of
UNION per se are only in my initial post, and after I've read of the
'ALL' option I make no further reference to the original construct
(where the cost of SELECT 1 was 3600ms as oposed to 830ms for current
UNION ALL).

Currently I'm digging why the SELECT on UNION takes 830ms, while SELECT
on respective raw log-tables take just 120ms and 80ms respectively -
where does the remaining 600ms go.

I have notices the spurious Subquery Scan *SELECT* 1 ... server
task, which takes more then the indispensable seq-scan on respective
table while does not serve any purpose  to my unexperienced eye at
least.

And why the same seq-scan taken by select on my log-table *within* a
UNION is more expensive, than when it's taken on that table by itself:
120ms rises to 170ms, and 80ms rises to 120ms for log1/log2 tables
respectively.
-- 
-R

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Tom Lane
Alban Hertroys [EMAIL PROTECTED] writes:
 There's practically no difference between SELECT 1 FROM ... and SELECT * 
 FROM ...; the only added costs (AFAIK) are for actually fetching the 
 column values and such. Pretty cheap operations.

You're both glossing over exactly the wrong thing, particularly seeing
that Rafal appears to be using 8.0 or older which hasn't got 8.1's
significant reductions in targetlist evaluation costs.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] optimising UNION performance

2006-08-28 Thread Rafal Pietrak
This is a little strange - my response to this post apparently got lost
in the net?? I haven't received it back through the list nor it's
visible in the archieve. Yet, my exim logfile contains entry indicating
'delivery complited'???

But to the point.

All the EXPLAIN ANALISE I did on posggres v8.1.4 - plain *.deb binary
install from debian network repository.

So if version v8.1 was expected to behave any differently, it doesn't.
But may be the reduction improvement are in some pre-8.2 versions?

-R

On Mon, 2006-08-28 at 10:46 -0400, Tom Lane wrote:
 Alban Hertroys [EMAIL PROTECTED] writes:
  There's practically no difference between SELECT 1 FROM ... and SELECT * 
  FROM ...; the only added costs (AFAIK) are for actually fetching the 
  column values and such. Pretty cheap operations.
 
 You're both glossing over exactly the wrong thing, particularly seeing
 that Rafal appears to be using 8.0 or older which hasn't got 8.1's
 significant reductions in targetlist evaluation costs.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
-- 
-R

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster