[GENERAL] optimising UNION performance
Hi all, Is there a way to speed up the query to my 'grand total' logfile, constructed as a UNION of smaller (specialised) logfiles? Access to log1/log2 is quick (If I'm reading ANALYSE log correctly, it's c.a. 100ms each - and it feels like that, so presumebly I'm reading ANALYSE just OK), but the UNION is quite slow (3.5 sec, which I can confirm - it's the real response time of the server in that case) Again, if I'm reading the ANALYSE correctly (and I'm not really sure of that), it looks like the performence is hit by the sort performed by the UNION. The sort is not really needed in this case and don't really know why it's there (my comlog VIEW definition does not request one - or does it implicitly?). One point not shown in the attachment, is that 'brands' and 'clty' are foreing references. Should foreign reference matter here? Can someone give me a hint on how to write a VIEW, that returns concatenation of both log-tables within a sum of their respective access times (as of current, that would be c.a. 200ms) The ANALYSE is taken from postgres v8.1.4 installed from *.deb on Debian-sid, hosted by 800MHz/512MB Duron machine. It really strickes me, that 20k+30k rows is returned in 3.5sec! -- -R View public.comlog Column | Type | Modifiers --+-+--- opmode | boolean | date | date| jfinito | date| readout | numeric | brand1 | integer | brand2 | integer | clty | integer | opis | text| View definition: SELECT true AS opmode, date(c.tm) AS date, c.jfinito, c.readout, c.brand1, c.brand2, c.clty, c.opis FROM log1 c UNION SELECT false AS opmode, date(date_mi_interval(s.tm, '00:00:01'::interval)) AS date, NULL::unknown AS jfinito, s.state AS readout, s.brand1, s.brand2, s.clty, 'master' AS opis FROM log2 s; QUERY PLAN -- Seq Scan on log2 (cost=0.00..363.84 rows=20484 width=0) (actual time=0.022..55.879 rows=20484 loops=1) Total runtime: 81.673 ms (2 rows) QUERY PLAN Seq Scan on log1 (cost=0.00..698.16 rows=30916 width=0) (actual time=0.022..87.184 rows=30916 loops=1) Total runtime: 126.834 ms (2 rows) QUERY PLAN -- Subquery Scan comlog (cost=7137.30..8807.80 rows=51400 width=0) (actual time=2992.079..3507.783 rows=51400 loops=1) - Unique (cost=7137.30..8293.80 rows=51400 width=59) (actual time=2992.067..3359.512 rows=51400 loops=1) - Sort (cost=7137.30..7265.80 rows=51400 width=59) (actual time=2992.058..3123.836 rows=51400 loops=1) Sort Key: opmode, date, jfinito, readout, brand1, brand2, clty, opis - Append (cost=0.00..1755.71 rows=51400 width=59) (actual time=0.048..701.949 rows=51400 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..1084.61 rows=30916 width=59) (actual time=0.045..332.276 rows=30916 loops=1) - Seq Scan on log1 c (cost=0.00..775.45 rows=30916 width=59) (actual time=0.041..203.392 rows=30916 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..671.10 rows=20484 width=26) (actual time=0.088..227.255 rows=20484 loops=1) - Seq Scan on log2 s (cost=0.00..466.26 rows=20484 width=26) (actual time=0.077..146.642 rows=20484 loops=1) Total runtime: 3589.929 ms (10 rows) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] optimising UNION performance
Am Montag, 28. August 2006 10:23 schrieb Rafal Pietrak: Is there a way to speed up the query to my 'grand total' logfile, constructed as a UNION of smaller (specialised) logfiles? If it is sufficient for your purpose, you will find UNION ALL to be significantly faster. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] optimising UNION performance
On mán, 2006-08-28 at 10:23 +0200, Rafal Pietrak wrote: Hi all, Is there a way to speed up the query to my 'grand total' logfile, constructed as a UNION of smaller (specialised) logfiles? I do not know if this is relevant to your case, but possibly you can use a UNION ALL instead of a UNION. In many cases the UNION ALL gives petter performance, as the unique step can be skipped, as well as a sort needed by the unique. gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] optimising UNION performance
Thank you All for explanations. Looks loke that's what I was looking for. UNION ALL is quite satisfactory (830ms). And yet, somwhere I loose c.a. 600ms (as compared to 120ms+80ms of each respective 'raw' subquery) which as percentage seem signifficant. Does anybody know where the processing goes now? Currently, the ANALYSE looks like this: QUERY PLAN -- Subquery Scan comlog (cost=0.00..2269.71 rows=51400 width=0) (actual time=0.053..755.649 rows=51400 loops=1) - Append (cost=0.00..1755.71 rows=51400 width=59) (actual time=0.048..607.437 rows=51400 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..1084.61 rows=30916 width=59) (actual time=0.046..278.802 rows=30916 loops=1) - Seq Scan on log1 c (cost=0.00..775.45 rows=30916 width=59) (actual time=0.042..170.193 rows=30916 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..671.10 rows=20484 width=26) (actual time=0.055..200.223 rows=20484 loops=1) - Seq Scan on log2 s (cost=0.00..466.26 rows=20484 width=26) (actual time=0.044..127.301 rows=20484 loops=1) Total runtime: 822.901 ms (7 rows) - On Mon, 2006-08-28 at 09:11 +, Ragnar wrote: On mán, 2006-08-28 at 10:23 +0200, Rafal Pietrak wrote: Hi all, Is there a way to speed up the query to my 'grand total' logfile, constructed as a UNION of smaller (specialised) logfiles? I do not know if this is relevant to your case, but possibly you can use a UNION ALL instead of a UNION. In many cases the UNION ALL gives petter performance, as the unique step can be skipped, as well as a sort needed by the unique. gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- -R ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] optimising UNION performance
Rafal Pietrak wrote: Thank you All for explanations. Looks loke that's what I was looking for. UNION ALL is quite satisfactory (830ms). And yet, somwhere I loose c.a. 600ms (as compared to 120ms+80ms of each respective 'raw' subquery) which as percentage seem signifficant. Does anybody know where the processing goes now? Currently, the ANALYSE looks like this: QUERY PLAN -- Subquery Scan comlog (cost=0.00..2269.71 rows=51400 width=0) (actual time=0.053..755.649 rows=51400 loops=1) - Append (cost=0.00..1755.71 rows=51400 width=59) (actual time=0.048..607.437 rows=51400 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..1084.61 rows=30916 width=59) (actual time=0.046..278.802 rows=30916 loops=1) - Seq Scan on log1 c (cost=0.00..775.45 rows=30916 width=59) (actual time=0.042..170.193 rows=30916 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..671.10 rows=20484 width=26) (actual time=0.055..200.223 rows=20484 loops=1) - Seq Scan on log2 s (cost=0.00..466.26 rows=20484 width=26) (actual time=0.044..127.301 rows=20484 loops=1) Total runtime: 822.901 ms (7 rows) - Just to make sure: You do have an appropriate index over the tables in that UNION? From experience, it seems that PostgreSQL chooses a sequential scan over unioned sets instead of an index scan - the details escape me, but there is a good reason for that. I'm sure it's not for performance reasons, though. There have been some discussions about inheritance performance, which boils down to exactly this problem (inheritance basically is a UNION over all the tables involved). You may want to check the archives. Regards, -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] optimising UNION performance
On Mon, 2006-08-28 at 13:04 +0200, Alban Hertroys wrote: Rafal Pietrak wrote: Total runtime: 822.901 ms (7 rows) - Just to make sure: You do have an appropriate index over the tables in that UNION? Well. The logfiles don't have their own indexes but make foreign key references over brand1/brand2/clty columns. Unique constreins are on the target tables. But that particual ANALYSED query was: SELECT 1 FROM comlog. So, the use of seq-scan looks quite adequate to me (I haven't posted results to avoid cuttering of my original query with too many details, but the ANALYSE of SELECT * FROM comlog gives almost exactly the same cost and time, and *that* is what I will actually be doing in the application). So: SELECT 1 on comlog costs 830ms and is done by two saq-scans. but: SELECT 1 on log1 gives 120ms, and SELECT 1 on log2 gives 80ms. All three queries are executed as seq-scans which look OK, as I intend to fetch *all* the rows. And yet, there is this 600ms 'leak'. Now, as I try to read the ANALYSE output (which I'm not very proficient at - just blindly comparing them): for queries of raw logs (log1/log2), I can see just one server task: seq-scan for each respective query. But when I look at ANALYSE output of comlog SELECT, I can see, that: 1. the seq-scans is more expensive here: 170ms and 120ms respectively. Any reasons for that? 2. each scan has an additional job of: Subquery Scan *SELECT* 1 ... which costs even more (280ms and 230ms respectively), although it's purpose it not very clear to me. 3. only on top of that, there is an Append process, which looks cheap as expected. So I can risk an opinion, that I can seek the missing 600ms, I only don't understand why it's there. Or to put it the other way around: is there a way to write a UNION where the 200ms to 800ms cost increase does not occure. -R From experience, it seems that PostgreSQL chooses a sequential scan over unioned sets instead of an index scan - the details escape me, but there is a good reason for that. I'm sure it's not for performance reasons, though. There have been some discussions about inheritance performance, which boils down to exactly this problem (inheritance basically is a UNION over all the tables involved). You may want to check the archives. Regards, -- -R ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] optimising UNION performance
Rafal Pietrak wrote: Well. The logfiles don't have their own indexes but make foreign key references over brand1/brand2/clty columns. Unique constreins are on the target tables. So there's no index on the logfiles then? (A foreign key constraint doesn't create an index). It doesn't seem like in your case an index would cause any benefit, just so you know. It also means that your problem is different from mine; PostgreSQL using a seq-scan over a UNION (ALL) where indices were available over the parts (mind you, it would use the appropriate index on the seperate union parts). But that particual ANALYSED query was: SELECT 1 FROM comlog. So, the use of seq-scan looks quite adequate to me (I haven't posted results to avoid cuttering of my original query with too many details, but the ANALYSE of SELECT * FROM comlog gives almost exactly the same cost and time, and *that* is what I will actually be doing in the application). There's practically no difference between SELECT 1 FROM ... and SELECT * FROM ...; the only added costs (AFAIK) are for actually fetching the column values and such. Pretty cheap operations. So: SELECT 1 on comlog costs 830ms and is done by two saq-scans. but: SELECT 1 on log1 gives 120ms, and SELECT 1 on log2 gives 80ms. All three queries are executed as seq-scans which look OK, as I intend to fetch *all* the rows. And yet, there is this 600ms 'leak'. ... But when I look at ANALYSE output of comlog SELECT, I can see, that: 1. the seq-scans is more expensive here: 170ms and 120ms respectively. Any reasons for that? 2. each scan has an additional job of: Subquery Scan *SELECT* 1 ... which costs even more (280ms and 230ms respectively), although it's purpose it not very clear to me. This is probably caused by using UNION as opposed to UNION ALL (as other people already mentioned). To merge duplicate results (one from either subquery) the database sorts[1] the results. To do that, it needs to compare with other records - hence the extra subquery, and probably the added 50ms as well. Regards, -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] optimising UNION performance
On Mon, 2006-08-28 at 14:50 +0200, Alban Hertroys wrote: Rafal Pietrak wrote: But when I look at ANALYSE output of comlog SELECT, I can see, that: 1. the seq-scans is more expensive here: 170ms and 120ms respectively. Any reasons for that? 2. each scan has an additional job of: Subquery Scan *SELECT* 1 ... which costs even more (280ms and 230ms respectively), although it's purpose it not very clear to me. This is probably caused by using UNION as opposed to UNION ALL (as other people already mentioned). To merge duplicate results (one from either subquery) the database sorts[1] the results. To do that, it needs to compare with other records - hence the extra subquery, and probably the added 50ms as well. No no no. The above 1. 2. 3. is read from UNION ALL analysis - the results of UNION per se are only in my initial post, and after I've read of the 'ALL' option I make no further reference to the original construct (where the cost of SELECT 1 was 3600ms as oposed to 830ms for current UNION ALL). Currently I'm digging why the SELECT on UNION takes 830ms, while SELECT on respective raw log-tables take just 120ms and 80ms respectively - where does the remaining 600ms go. I have notices the spurious Subquery Scan *SELECT* 1 ... server task, which takes more then the indispensable seq-scan on respective table while does not serve any purpose to my unexperienced eye at least. And why the same seq-scan taken by select on my log-table *within* a UNION is more expensive, than when it's taken on that table by itself: 120ms rises to 170ms, and 80ms rises to 120ms for log1/log2 tables respectively. -- -R ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] optimising UNION performance
Alban Hertroys [EMAIL PROTECTED] writes: There's practically no difference between SELECT 1 FROM ... and SELECT * FROM ...; the only added costs (AFAIK) are for actually fetching the column values and such. Pretty cheap operations. You're both glossing over exactly the wrong thing, particularly seeing that Rafal appears to be using 8.0 or older which hasn't got 8.1's significant reductions in targetlist evaluation costs. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] optimising UNION performance
This is a little strange - my response to this post apparently got lost in the net?? I haven't received it back through the list nor it's visible in the archieve. Yet, my exim logfile contains entry indicating 'delivery complited'??? But to the point. All the EXPLAIN ANALISE I did on posggres v8.1.4 - plain *.deb binary install from debian network repository. So if version v8.1 was expected to behave any differently, it doesn't. But may be the reduction improvement are in some pre-8.2 versions? -R On Mon, 2006-08-28 at 10:46 -0400, Tom Lane wrote: Alban Hertroys [EMAIL PROTECTED] writes: There's practically no difference between SELECT 1 FROM ... and SELECT * FROM ...; the only added costs (AFAIK) are for actually fetching the column values and such. Pretty cheap operations. You're both glossing over exactly the wrong thing, particularly seeing that Rafal appears to be using 8.0 or older which hasn't got 8.1's significant reductions in targetlist evaluation costs. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- -R ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster