Re: [HACKERS] raising the default default_statistics_target
I think the thread you're thinking of is on or about this post: http://archives.postgresql.org/pgsql-patches/2003-12/msg00039.php Manfred Koizar produced a patch that modified index correlation by sorting equal key values based on item pointers. The patch went as far as getting accepted into the patch queue, but Tom raised some doubts about it and it was subsequently removed. Robert Treat On Mon, 2004-03-08 at 14:41, Josh Berkus wrote: Tom, Are you sure you're not thinking of stats for functional indexes? Positive.I even remember seeing that the patch was accepted. The patch specifically had to do with a multi-column correlation algorithm for improving the selectivity of multi-column indexes. Problem is, with 1400 posts per month August to October, I can't find it, and the keywords that I think are obvious don't turn anything up. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] raising the default default_statistics_target
Robert Treat [EMAIL PROTECTED] writes: I think the thread you're thinking of is on or about this post: http://archives.postgresql.org/pgsql-patches/2003-12/msg00039.php Manfred Koizar produced a patch that modified index correlation by sorting equal key values based on item pointers. The patch went as far as getting accepted into the patch queue, but Tom raised some doubts about it and it was subsequently removed. Hm, that had nothing to do with multi-column correlation though. I'm at a loss to think of any work that matches with Josh's recollection. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] raising the default default_statistics_target
Guys, Hm, that had nothing to do with multi-column correlation though. I'm at a loss to think of any work that matches with Josh's recollection. H it's possible that early e-mails about Manfred's patch claimed to improve performance for multi-column indexes. But it's also possible I'm remembering something else. Darn it, though! 'cause multi-column correlation is one of our big issues on estimates for complex queries. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] raising the default default_statistics_target
On Sun, 7 Mar 2004, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Tom Lane wrote: This is something we need to consider, but we'll need more evidence before making a choice. One thing that we have very little data about is how much difference it makes in the quality of planner choices. Right, but is there a practical way to actually get this data? I haven't thought of one yet, but perhaps someone will have an idea. Hi Tom. I ran some very simple tests on analyze times and query plan times on a very simple table, with data randomly distributed. The index was on a date field, since that's what I was testing last. This was all done on my 512Meg memory 1.1GHz celeron workstation with an IDE drive. I'd love more input on better testing methodologies here... with 100k or 1M rows that look kinda like this: (I'll test 10M rows later, which means the dataset won't fit in memory, so there'll be lots of access going on. Right now the 1M row table is 80 meg) select * from test2 limit 5; info | dt | id -+-+- Francize perfectible swirling fluctuates| 2004-05-20 20:12:04 | 2721995 Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996 Belgium bilked explosively defendant| 2004-09-16 16:27:22 | 2721997 perspectives Buenos Pollux discriminates| 2004-11-11 12:28:31 | 2721998 Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999 (5 rows) here's what I get with different statistics targets for analyze times: 100k1M 1M analyze analyze plan target ms ms ms 10 250 875 2 20 350 1250 30 430 1500 40 520 1725 50 580 1900 60 690 2100 70 775 2175 80 850 2300 90 950 2400 100 100026002.5 200 18063700 300 26004800 400 26005900 500 26007200 700 26009500 1000260013000 5 Since this data is randomly distributed, I didn't bother doing a lot of testing to see how accurate each target setting was. If that would be useful to know I'd gladly test it, but I was only setting out to test the time to analyze and the time to plan. Note that I only tested 3 targets for planning time, as it didn't seem to make a very big difference. The query was: select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004'; I also ran some quick tests on smaller tables (1000 and 10k rows) and there, the plateau that we see in the 100k analyze shows up much quicker, at something like 50 or so. I.e. the analyze time flattened out quickly and higher numbers cost very little if anything. Since this query was quite an easy plan, I'd expect to need a much more complex one to test the increase in planning time, say something that has to look at a lot of statistics. Any particular join type or something that's likely to do that? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] raising the default default_statistics_target
scott.marlowe [EMAIL PROTECTED] writes: Hi Tom. I ran some very simple tests on analyze times and query plan times on a very simple table, with data randomly distributed. The index was on a date field, since that's what I was testing last. Thanks. I also ran some quick tests on smaller tables (1000 and 10k rows) and there, the plateau that we see in the 100k analyze shows up much quicker, at something like 50 or so. I.e. the analyze time flattened out quickly and higher numbers cost very little if anything. The sample size is (IIRC) 300 times stats_target rows, so the plateau that you're seeing occurs when the sample size becomes the entire table. It would be useful to note how large the ANALYZE process got to be during these runs. Since this query was quite an easy plan, I'd expect to need a much more complex one to test the increase in planning time, say something that has to look at a lot of statistics. Any particular join type or something that's likely to do that? I'd say try a join on any reasonably plausible foreign-key relationship (unique key on one side, not-unique data on the other). That's probably the most common situation. As for making it complicated, just stack up a bunch of such joins ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] raising the default default_statistics_target
Neil, In the simple test I performed, raising the default_statistics_target from 10 to 25 resulted in a 40% increase in the time to ANALYZE a large table. (I picked 25 more or less at random -- would 15 or 20 be better?) I find that very interesting, since I haven't found much higher increases to be a proportionate penality. For example, on an 11-column table raising 3 columns to statistics=250 merely doubled the ANALYZE time. I have not done exact timing, but would be happy to It may also be the case that for those people for whom 10 is an insufficient stats target, 25 is also insufficient. It is. I've found that problem queries, especially those caused by real, uneven distribution of data, require raising statistics to 150-400 in order to fix. This is much to high a level to assign as a default. Any comments on whether increasing the default stats target is a good idea for 7.5? (Details on the test I performed are included below) No. I don't think it's a good idea to raise the default for *all* columns; for one thing, I'd really hate to think what, say, a default stats of 100 would do to a TEXT column with an average of 8K of data per row. Further, in 7.5 we'll be introducing correlated stats for multi-column indexes (unless something's gone off with that?) which should help a lot of problem queries. And change our whole emphasis on brute forcing analyze through increasing stats into the 100's. If you really want to tackle this issue, though, here's what I suggest: 1) add a GUC called default_statistics_indexed, which starts at say 100 or 50. 2) When ever the user indexes a column, automatically increase the stats to the level in default_statistics_indexed, if they are at the level in default_statistics_target. This will then give indexed columns automatically a somewhat higher level of stats analysis than other columns. This should help a lot of slow query problems, yet effectively leave the selection of important columns in the hands of the DBA. Make sense? Also, another great feature in this department would be to extend the multi-column correlation statistics to cover foriegn keys, as a way of improving cross-table estimates. Anyway, keep me in the loop on this, I have a lot of very complex databases I can test such issues on. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] raising the default default_statistics_target
Josh Berkus [EMAIL PROTECTED] writes: It is. I've found that problem queries, especially those caused by real, uneven distribution of data, require raising statistics to 150-400 in order to fix. This is much to high a level to assign as a default. That's basically what's bothering me about the suggestion to increase to 25 --- I'm dubious that it will do any good. Further, in 7.5 we'll be introducing correlated stats for multi-column indexes (unless something's gone off with that?) News to me. It's certainly not there now. This will then give indexed columns automatically a somewhat higher level of stats analysis than other columns. That is potentially a good idea. There's still the question of what is a reasonable default, though. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] raising the default default_statistics_target
Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: It is. I've found that problem queries, especially those caused by real, uneven distribution of data, require raising statistics to 150-400 in order to fix. This is much to high a level to assign as a default. That's basically what's bothering me about the suggestion to increase to 25 --- I'm dubious that it will do any good. Further, in 7.5 we'll be introducing correlated stats for multi-column indexes (unless something's gone off with that?) News to me. It's certainly not there now. This will then give indexed columns automatically a somewhat higher level of stats analysis than other columns. That is potentially a good idea. There's still the question of what is a reasonable default, though. Do all the columns have to have the same number of statistics buckets? Could that stats collector adjust the number of buckets based on the data somehow? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] raising the default default_statistics_target
Bruce Momjian [EMAIL PROTECTED] writes: Do all the columns have to have the same number of statistics buckets? They do not, but the effort spent by ANALYZE is proportional to the largest stats target among all the columns of the table. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] raising the default default_statistics_target
Tom, Further, in 7.5 we'll be introducing correlated stats for multi-column indexes (unless something's gone off with that?) This was discussed on Hackers in October, a complete implementation was shown, I thought it was committed at that time. If not, what happened to it? Dammit, it's impossible to find anything in the archives if you don't have some good keywords or at least the author. Is the autor reading this? Will you speak up? -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] raising the default default_statistics_target
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Do all the columns have to have the same number of statistics buckets? They do not, but the effort spent by ANALYZE is proportional to the largest stats target among all the columns of the table. Could we use previous stats to determine how many buckets to use when running ANALYZE. Also, if columns have a different number of buckets, does that mean that we don't have the same per-query overhead for a larger stats target? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] raising the default default_statistics_target
Josh Berkus [EMAIL PROTECTED] writes: Further, in 7.5 we'll be introducing correlated stats for multi-column indexes (unless something's gone off with that?) This was discussed on Hackers in October, a complete implementation was shown, I thought it was committed at that time. If not, what happened to it? Are you sure you're not thinking of stats for functional indexes? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] raising the default default_statistics_target
Tom, Are you sure you're not thinking of stats for functional indexes? Positive.I even remember seeing that the patch was accepted. The patch specifically had to do with a multi-column correlation algorithm for improving the selectivity of multi-column indexes. Problem is, with 1400 posts per month August to October, I can't find it, and the keywords that I think are obvious don't turn anything up. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] raising the default default_statistics_target
Neil Conway [EMAIL PROTECTED] writes: Any comments on whether increasing the default stats target is a good idea for 7.5? (Details on the test I performed are included below) This is something we need to consider, but we'll need more evidence before making a choice. One thing that we have very little data about is how much difference it makes in the quality of planner choices. (There's no point in slowing down ANALYZE unless the plans get better.) Also, I would expect that larger stats targets would slow down the parts of the planner that look at the stats, since there are more data values to examine. I do not have any numbers about this cost though --- do you want to try to get some? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] raising the default default_statistics_target
Tom Lane wrote: This is something we need to consider, but we'll need more evidence before making a choice. One thing that we have very little data about is how much difference it makes in the quality of planner choices. Right, but is there a practical way to actually get this data? If the distribution of data in the table is irregular, a higher stats target is needed to allow good planning choices. Therefore, the effect that the stats target has on planner choices depends on the regularity of the distribution of data at installations, and there is no way to know that in general AFAICS. Also, I would expect that larger stats targets would slow down the parts of the planner that look at the stats, since there are more data values to examine. I do not have any numbers about this cost though --- do you want to try to get some? Given the magnitude of the change (25 data elements versus 10), I wouldn't expect this to produce a major change in the total runtime of the optimizer. However, I don't know the optimizer that well, so I'll do some benchmarks when I get a chance. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] raising the default default_statistics_target
Neil Conway [EMAIL PROTECTED] writes: Tom Lane wrote: This is something we need to consider, but we'll need more evidence before making a choice. One thing that we have very little data about is how much difference it makes in the quality of planner choices. Right, but is there a practical way to actually get this data? I haven't thought of one yet, but perhaps someone will have an idea. Also, I would expect that larger stats targets would slow down the parts of the planner that look at the stats, since there are more data values to examine. I do not have any numbers about this cost though --- do you want to try to get some? Given the magnitude of the change (25 data elements versus 10), I wouldn't expect this to produce a major change in the total runtime of the optimizer. I wouldn't either, but if we need to raise the stats target to 100 or 1000 to make a meaningful difference, then the question becomes more urgent. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings