Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-17 Thread Hans-Jürgen Schönig




this entire thing is not about cartesian products at all.
it is about kicking out expensive queries before they even start to  
eat up tons of CPU.
imagine a user asking for give me all phone call in the US within the  
past 10 years. you could kill the guy instantly because you know that  
this would take ages.
in addition to that you know that in an OLTP context everything which  
is expected to take longer than X cannot be useful anyway.
this has nothing to do with cartesian products or other bad things you  
can do in SQL.

it is just a simple and heuristic check.

many thanks,

hans




My point is that people should _know_ they are using a cartesian
product, and a warning would do that for users who have no need for a
cartesian product and want to be warned about a possible error.

--
 Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
 EnterpriseDB http://enterprisedb.com

 + If your life is a hard drive, Christ can be your backup. +



--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Bruce Momjian
Josh Berkus wrote:
 Greg,
 
  Well that's going to depend on the application But I suppose there's
  nothing wrong with having options which aren't always a good idea to use. 
  The
  real question I guess is whether there's ever a situation where it would be 
  a
  good idea to use this. I'm not 100% sure.
 
 I can think of *lots*.   Primarily, simple web applications, where 
 queries are never supposed to take more than 50ms.  If a query turns up 
 with an estimated cost of 100, then you know something's wrong; 
 in the statistics if not in the query.  In either case, that query has a 
 good chance of dragging down the whole system.
 
 In such a production application, it is better to have false positives 
 and reject otherwise-OK queries becuase their costing is wrong, than to 
 let a single cartesian join bog down an application serving 5000 
 simultaneous users.  Further, with a SQL error, this would allow the 

How about a simpler approach that throws an error or warning for
cartesian products?  That seems fool-proof.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Ron Mayer

Bruce Momjian wrote:

Josh Berkus wrote:
...simple web applications, where 
queries are never supposed to take more than 50ms.  If a query turns up 
with an estimated cost of 100, then you know something's wrong; 
...


How about a simpler approach that throws an error or warning for
cartesian products?  That seems fool-proof.


Seems less fool-proof to me.

Sometimes cartesian products produce plans that run 200 times
faster than plans that don't use the cartesian product.

The first link below shows a cartesian join that took 1.1
seconds (within the range of OK for some web apps), while
plans for the same query that don't use one took 200 seconds.

http://archives.postgresql.org/pgsql-performance/2008-03/msg00391.php
http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php
http://archives.postgresql.org/pgsql-performance/2008-03/msg00361.php


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Bruce Momjian
Ron Mayer wrote:
 Bruce Momjian wrote:
  Josh Berkus wrote:
  ...simple web applications, where 
  queries are never supposed to take more than 50ms.  If a query turns up 
  with an estimated cost of 100, then you know something's wrong; 
  ...
  
  How about a simpler approach that throws an error or warning for
  cartesian products?  That seems fool-proof.
 
 Seems less fool-proof to me.
 
 Sometimes cartesian products produce plans that run 200 times
 faster than plans that don't use the cartesian product.
 
 The first link below shows a cartesian join that took 1.1
 seconds (within the range of OK for some web apps), while
 plans for the same query that don't use one took 200 seconds.
 
 http://archives.postgresql.org/pgsql-performance/2008-03/msg00391.php
 http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php
 http://archives.postgresql.org/pgsql-performance/2008-03/msg00361.php

My point is that people should _know_ they are using a cartesian
product, and a warning would do that for users who have no need for a
cartesian product and want to be warned about a possible error.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Ron Mayer wrote:
 Seems less fool-proof to me.

 My point is that people should _know_ they are using a cartesian
 product, and a warning would do that for users who have no need for a
 cartesian product and want to be warned about a possible error.

There are quite a lot of standard applications where small cartesian
products make sense --- star schemas are the traditional example.
I recall some discussions awhile back about how to persuade the planner
to consider such cases, in fact.  Right now it'll generally use a series
of hash joins where maybe just one would be better.

I concur with Ron that the only merit of this proposal is that it's
(relatively) simple to implement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Robert Haas
 My point is that people should _know_ they are using a cartesian
 product, and a warning would do that for users who have no need for a
 cartesian product and want to be warned about a possible error.

I think Cartesian products are a red herring.  Cartesian products are
primarily bad if they generate bad performance, and bad performance
can be generated without Cartesian products.  I've certainly written
them intentionally, from time to time.  The bigger issue is - if you
start critiquing people's query-writing, where will you stop?

SELECT * FROM foo WHERE a = NULL
WARNING: a = NULL is always false.  Did you mean id IS NULL?

SELECT * FROM foo LEFT JOIN bar ON foo.a = bar.a LEFT JOIN baz ON
foo.b = bar.b AND foo.c = baz.c
WARNING: Maybe you meant foo.b = baz.b instead of foo.b = bar.b?

I'm sure there are a hundred others - these just happen to be a few of
my old mistakes (the first one was generated by some buggy Perl
code... the second by poor cut-and-paste skills).  In any event, I
don't think it's the job of the database to argue with you about
whether you really want the data you asked for - it's job is just to
get you that data.  Of course if doing so will take longer than the
amount of time remaining before the heat death of the universe, a
warning might be appropriate.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-15 Thread Josh Berkus
Bruce,

 How about a simpler approach that throws an error or warning for
 cartesian products?  That seems fool-proof.

Well, throwing a warning is pretty useless for an unattended application.  

Also, it's perfectly possible to write queries which will never complete 
without a cartesian join.

Basically, *unless* someone has a plan to integrate the greenplum resource 
management stuff sometime soon, I think we should take Hans' idea (pending 
patch quality, of course).  There's an argument to be made that even if we 
took the greenplum resource controller, statement_cost_limit would be much 
simpler and worth having for the small-effort simple-application users.

FYI, the concept behind the Greenplum RM is similar, except that it 
maintains a total pool of query costs on a per-role basis.  

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Decibel!

On Aug 3, 2008, at 9:57 PM, Robert Treat wrote:
I think a variation on this could be very useful in development  
and test
environments. Suppose it raised a warning or notice if the cost  
was over
the limit. Then one could set a limit of a few million on the  
development

and test servers and developers would at least have a clue that they
needed to look at explain for that query. As it is now, one can  
exhort
them to run explain, but it has no effect.  Instead we later see  
queries
killed by a 24 hour timeout with estimated costs ranging from  
until they

unplug the machine and dump it to until the sun turns into a red
giant.


Great argument. So that's 4 in favour at least.



Not such a great argument. Cost models on development servers can  
and often
are quite different from those on production, so you might be  
putting an

artifical limit on top of your developers.



We should have an approved API for dumping stats from one database  
and loading them into another. pg_dump needs this as well, IMO.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Decibel!

On Aug 4, 2008, at 3:49 PM, Simon Riggs wrote:

On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:

On Monday 04 August 2008 03:50:40 daveg wrote:



And you'll note, I specifically said that a crude tool is better than
nothing. But your completely ignoring that a crude tool can often
end-up as a foot-gun once relased into the wild.


The proposal is for an option with no consequences when turned off. We
respect your right not to use it. What is the danger exactly?

If we cancel stupid queries before people run them, everybody is a
winner. Even the person who submitted the stupid query, since they  
find

out faster.


I could *really* use this. Unfortunately, we have a lot of folks  
writing some horrible queries and killing our slave databases. I'd  
*love* to be able to throw out any queries that had insane limits...



We'll have to do something with enable_seqscan, BTW, chaps.


My thought would be to back the cost penalty out if we end up with a  
seqscan anyway.


Speaking of which, there is a semi-related issue... if you have a  
large enough table the fixed-size cost we add to a seqscan won't be  
enough to make an alternative plan come out cheaper. Instead of  
adding a fixed cost, I think we should multiply by the estimated  
number of rows.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-11 Thread Casey Duncan

On Aug 4, 2008, at 1:04 PM, daveg wrote:

Ok, that is a different use case where an error seems very useful.  
What
about slightly extending the proposal to have the severity of  
exceeding

the limit configurable too. Something like:

  costestimate_limit = 10 # default 0 to ignore limit
  costestimate_limit_severity = error # debug, notice, warning,  
error


I very much like this idea, and I would definitely use something like  
this on our production oltp app. We had a case recently where a query  
joining two large tables was very fast 99.9% of the time (i.e., a few  
ms), but for particular, rare key combinations the planner would make  
a poor choice turning into a multi-minute monster. It ran longer than  
the web server timeout, and the client was programmed to retry on  
error, essentially causing a database DoS.


The monster version of the plan had an outrageous cost estimate, many  
orders of magnitude higher than any regular app query, and would be  
easy to peg using even a crudely chosen limit value.


The problem was first mitigated by setting a query timeout a little  
longer than the web server timeout (since the query results are  
discarded for anything running longer), but even this was not a  
solution, since the client would retry on timeout, still keeping the  
db too busy. The real solution was to not do the query, but it would  
have been better to identify this via ERRORs in the logs than by the  
database becoming saturated in the middle of the day.


For our application it is far better for an expensive query to be  
rejected outright than to attempt to run it in vain. Just thought I'd  
throw that out as anecdotal support.


-Casey


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-05 Thread Heikki Linnakangas

Simon Riggs wrote:

On Sun, 2008-08-03 at 22:09 +0200, Hans-Jürgen Schönig wrote:

Another alternative would be to have a plugin that can examine the  
plan
immediately after planner executes, so you can implement this  
yourself,

plus some other possibilities.




this would be really fancy.
how could a plugin like that look like?


Hmm...thinks: exactly like the existing planner_hook().

So, rewrite this as a planner hook and submit as a contrib module.


Now that's a good idea!

I personally don't think this feature is a good idea, for all the 
reasons others have mentioned, but as a pgfoundry project it can be 
downloaded by those who want it, and perhaps prove its usefulness for 
others as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Simon Riggs

On Sun, 2008-08-03 at 22:57 -0400, Robert Treat wrote:

 I still think it is worth revisiting what problems people are trying
 to solve, and see if there are better tools they can be given to solve
 them.  Barring that, I suppose a crude solution is better than
 nothing, though I fear people might point at the crude solution as a
 good enough solution to justify not working on better solutions. 

I advocate solutions to the problems of users I've worked with.

My preference is to help people in the next release, then improve from
there. We need to work with what we have.

In this case, an existing solution has been found.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
 
 ISTR that what ended up killing the enthusiasm for this was that most people 
 realized that this GUC was just a poor tool to take a stab at solving other 
 problems (ie. rate limiting cpu for queries). 

I'm not concerned with that, I want developers to have feed back on costs in
a way that is obvious.
 
   I think a variation on this could be very useful in development and test
   environments. Suppose it raised a warning or notice if the cost was over
   the limit. Then one could set a limit of a few million on the development
   and test servers and developers would at least have a clue that they
   needed to look at explain for that query. As it is now, one can exhort
   them to run explain, but it has no effect.  Instead we later see queries
   killed by a 24 hour timeout with estimated costs ranging from until they
   unplug the machine and dump it to until the sun turns into a red
   giant.
 
  Great argument. So that's 4 in favour at least.
 
 
 Not such a great argument. Cost models on development servers can and often 
 are quite different from those on production, so you might be putting an 
 artifical limit on top of your developers. 

We load the production dumps into our dev environment, which are the same
hardware spec, so the costs should be identical.
 
 I still think it is worth revisiting what problems people are trying to 
 solve, 
 and see if there are better tools they can be given to solve them.  Barring 
 that, I suppose a crude solution is better than nothing, though I fear people 
 might point at the crude solution as a good enough solution to justify not 
 working on better solutions. 

Alerting developers and QA to potentially costly queries would help solve
some of the probems we are trying to solve. Better tools are welcome, an
argument that the good is the enemy of the best so we should be content with
nothing is not.

-dg
 

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes:

 Tom,

 Wasn't this exact proposal discussed and rejected awhile back?

 We rejected Greenplum's much more invasive resource manager, because it 
 created a large performance penalty on small queries whether or not it was 
 turned on.  However, I don't remember any rejection of an idea as simple 
 as a cost limit rejection.

The idea's certainly come up before. It probably received the usual
non-committal cold shoulder rather than an outright rejection.

 This would, IMHO, be very useful for production instances of PostgreSQL.  
 The penalty for mis-rejection of a poorly costed query is much lower than 
 the penalty for having a bad query eat all your CPU.

Well that's going to depend on the application But I suppose there's
nothing wrong with having options which aren't always a good idea to use. The
real question I guess is whether there's ever a situation where it would be a
good idea to use this. I'm not 100% sure.

What I would probably use myself is an option to print a warning before
starting the query. That would be handy for interactive sessions so you would
be able to hit C-c instead of waiting for several minutes and then wondering
whether you got the query wrong.

I wonder if it would be useful to have a flag on some GUC options to make them
not globally settable. That is, for example, you could set enable_seqscan in
an individual session but not in postgres.conf. Or perhaps again just print a
warning that it's not recommended as a global configuration.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 03:50:40 daveg wrote:
 On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
  ISTR that what ended up killing the enthusiasm for this was that most
  people realized that this GUC was just a poor tool to take a stab at
  solving other problems (ie. rate limiting cpu for queries).

 I'm not concerned with that, I want developers to have feed back on costs
 in a way that is obvious.


That was one of the other use cases that was pushed forward in the past. 

I think a variation on this could be very useful in development and
test environments. Suppose it raised a warning or notice if the cost
was over the limit. Then one could set a limit of a few million on
the development and test servers and developers would at least have a
clue that they needed to look at explain for that query. As it is
now, one can exhort them to run explain, but it has no effect. 
Instead we later see queries killed by a 24 hour timeout with
estimated costs ranging from until they unplug the machine and dump
it to until the sun turns into a red giant.
  
   Great argument. So that's 4 in favour at least.
 
  Not such a great argument. Cost models on development servers can and
  often are quite different from those on production, so you might be
  putting an artifical limit on top of your developers.

 We load the production dumps into our dev environment, which are the same
 hardware spec, so the costs should be identical.


That's great for you, I am talking in the scope of a general solution. (Note 
I'd also bet that even given the same hardware, different production loads 
can produce different relative mappings of cost vs. performance, but 
whatever)

  I still think it is worth revisiting what problems people are trying to
  solve, and see if there are better tools they can be given to solve them.
   Barring that, I suppose a crude solution is better than nothing, though
  I fear people might point at the crude solution as a good enough solution
  to justify not working on better solutions.

 Alerting developers and QA to potentially costly queries would help solve
 some of the probems we are trying to solve. Better tools are welcome, an
 argument that the good is the enemy of the best so we should be content
 with nothing is not.


And you'll note, I specifically said that a crude tool is better than nothing. 
But your completely ignoring that a crude tool can often end-up as a foot-gun 
once relased into the wild. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Josh Berkus

Greg,


Well that's going to depend on the application But I suppose there's
nothing wrong with having options which aren't always a good idea to use. The
real question I guess is whether there's ever a situation where it would be a
good idea to use this. I'm not 100% sure.


I can think of *lots*.   Primarily, simple web applications, where 
queries are never supposed to take more than 50ms.  If a query turns up 
with an estimated cost of 100, then you know something's wrong; 
in the statistics if not in the query.  In either case, that query has a 
good chance of dragging down the whole system.


In such a production application, it is better to have false positives 
and reject otherwise-OK queries becuase their costing is wrong, than to 
let a single cartesian join bog down an application serving 5000 
simultaneous users.  Further, with a SQL error, this would allow the 
query rejection to be handled in a user-friendly way from the UI 
(Search too complex.  Try changing search terms.) rather than timing 
out, which is very difficult to handle well.


The usefulness of this feature for interactive sessions is 
limited-to-nonexistant.  It's for production applications.


--Josh Berkus


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Greg Smith

On Mon, 4 Aug 2008, daveg wrote:


On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:

Not such a great argument. Cost models on development servers can and often
are quite different from those on production, so you might be putting an
artifical limit on top of your developers.


We load the production dumps into our dev environment, which are the same
hardware spec, so the costs should be identical.


Not identical, just close.  ANALYZE samples data from your table randomly. 
The statistics used to compute the costs will therefore be slightly 
different on the two servers even if the data is the same.  The problem of 
discovering one plan on production and another on development is not quite 
that easy to remove.  Ultimately, if your developers aren't thorough 
enough to do thinks like look at EXPLAIN plans enough to discover things 
that are just bad, I just chuckle at your thinking that putting a single 
limiter on their bad behavior will somehow magically make that better.


Anyway, if your production server is small enough that you can afford to 
have another one just like it for the developers to work on, that's great. 
Robert's point is that many installs don't work like that.  The 
development teams in lots of places only get a subset of the production 
data because it's too large to deploy on anything but a big server, which 
often is hard to cost justify buying just for development purposes.


I like the concept of a cost limit, but I'm a bit horrified by the thought 
of it being exposed simply through the internal cost numbers because they 
are so arbitrary.  One of the endless projects I think about but never 
start coding is to write something that measures the things the planner 
cost constants estimate on a particular machine, so that all those numbers 
actually can be tied to some real-world time measure.  If you did that, 
you'd actually have a shot at accomplishing the real goal here, making 
statement_cost_limit cut off statements expected to take longer than 
statement_timeout before they even get started.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 03:09:34PM -0400, Greg Smith wrote:
 On Mon, 4 Aug 2008, daveg wrote:
 We load the production dumps into our dev environment, which are the same
 hardware spec, so the costs should be identical.
 
 Not identical, just close.  ANALYZE samples data from your table randomly. 
 The statistics used to compute the costs will therefore be slightly 
 different on the two servers even if the data is the same.  The problem of 
 discovering one plan on production and another on development is not quite 
 that easy to remove.  Ultimately, if your developers aren't thorough 
 enough to do thinks like look at EXPLAIN plans enough to discover things 
 that are just bad, I just chuckle at your thinking that putting a single 
 limiter on their bad behavior will somehow magically make that better.

Not all developers can be persuaded to run explain on every change.
However, many will investigate a new message. I'm only hoping to try to
focus their attention toward possible problem queries.

 Anyway, if your production server is small enough that you can afford to 
 have another one just like it for the developers to work on, that's great. 
 Robert's point is that many installs don't work like that.  The 
 development teams in lots of places only get a subset of the production 
 data because it's too large to deploy on anything but a big server, which 
 often is hard to cost justify buying just for development purposes.

Not to get into a size war ;-), but the production environment I'd like this
feature for is over 40 32GB 16 scsi drive quadcore boxes. These are dedicated
to postgresql and run one or just a few databases. There are also a bunch
of client boxes that we will not speak of. The staging and test environments
are similar hardware but have only a subset of the databases copied to them.
There are probably than a dozen DB hosts for that.

 I like the concept of a cost limit, but I'm a bit horrified by the thought 
 of it being exposed simply through the internal cost numbers because they 
 are so arbitrary.  One of the endless projects I think about but never 


 start coding is to write something that measures the things the planner 
 cost constants estimate on a particular machine, so that all those numbers 
 actually can be tied to some real-world time measure.  If you did that, 
 you'd actually have a shot at accomplishing the real goal here, making 
 statement_cost_limit cut off statements expected to take longer than 
 statement_timeout before they even get started.

That is a nice idea. Possibly it could be a utility like the fsync tester.

But planner estimates are never going to be all that accurate even with solid
cost numbers because for some classes of queries, particularly those with
many joins the stats can be good at each level but the error accumulates
exponentially. Which is why I think a warning is appropriate instead of an
error. Even a notice in the logs would be useful.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:
 
 That's great for you, I am talking in the scope of a general solution. (Note 
 I'd also bet that even given the same hardware, different production loads 
 can produce different relative mappings of cost vs. performance, but 
 whatever)

Even on different hardware it would still likely warn of mistakes like
products due to missing join conditions etc.
 
   I still think it is worth revisiting what problems people are trying to
   solve, and see if there are better tools they can be given to solve them.
Barring that, I suppose a crude solution is better than nothing, though
   I fear people might point at the crude solution as a good enough solution
   to justify not working on better solutions.
 
  Alerting developers and QA to potentially costly queries would help solve
  some of the probems we are trying to solve. Better tools are welcome, an
  argument that the good is the enemy of the best so we should be content
  with nothing is not.
 
 And you'll note, I specifically said that a crude tool is better than 
 nothing. 

I released somewhat after I sent the above that it might have sounded a bit
snippy. I hope I have not offended.

 But your completely ignoring that a crude tool can often end-up as a foot-gun 
 once relased into the wild. 

I'm suggesting a warning, or even just a notice into the logs, I don't see
the footgun. What am I missing?

Regards

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Hannu Krosing
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:
  On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
...
   I still think it is worth revisiting what problems people are trying to
   solve, and see if there are better tools they can be given to solve them.
Barring that, I suppose a crude solution is better than nothing, though
   I fear people might point at the crude solution as a good enough solution
   to justify not working on better solutions.
 
  Alerting developers and QA to potentially costly queries would help solve
  some of the probems we are trying to solve. Better tools are welcome, an
  argument that the good is the enemy of the best so we should be content
  with nothing is not.
 
 
 And you'll note, I specifically said that a crude tool is better than 
 nothing. 
 But your completely ignoring that a crude tool can often end-up as a foot-gun 
 once relased into the wild. 

On the other other hand, _anything_ can end up as a foot-gun in hands of
ingenious users.

I was once told about a company, who claimed to have produced a
positively fool-proof lawn-mower, only to find out, that a university
professor had tried to use it to trim a hedge and cut off his toes.

-
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes:

 In such a production application, it is better to have false positives and
 reject otherwise-OK queries becuase their costing is wrong, than to let a
 single cartesian join bog down an application serving 5000 simultaneous users.
 Further, with a SQL error, this would allow the query rejection to be handled
 in a user-friendly way from the UI (Search too complex.  Try changing search
 terms.) rather than timing out, which is very difficult to handle well.

 The usefulness of this feature for interactive sessions is
 limited-to-nonexistant.  It's for production applications.

Wow. I couldn't disagree more.

For such an application this would be a major foot-gun which would give a
false sense of security simultaneously causing random outages and not
providing even the protection you're counting on.

It would be quite likely to miss some cartesian joins and allow problematic
queries through randomly and block other perfectly legitimate queries. I's no
substitute for writing your search engine query generator to actually check
that it has enough constraints to avoid any disallowed cartesion joins.

That people might think it's reliable enough to use for such applications is
my major concern and if my guess is right, Tom's as well. I suspect you may
have just sunk any chance of getting him on-side.

Where I see it useful is a) during development when it might help catch
erroneous queries as a kind of sql-lint. and b) when running ad-hoc DBA
queries where it might let the DBA catch the error before letting it run for a
while. I'm sure I'm not the only DBA who let a query run for 5 minutes before
wondering if it should really be taking that long.

I would be much more comfortable if it produced a warning, not an error. And
much more if we implemented my previous thought of having some settings which
generate warnings if they're set at startup saying that's not recommended.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 11:59:03AM -0700, Josh Berkus wrote:
 Greg,
 
 Well that's going to depend on the application But I suppose there's
 nothing wrong with having options which aren't always a good idea to use. 
 The
 real question I guess is whether there's ever a situation where it would 
 be a
 good idea to use this. I'm not 100% sure.
 
 I can think of *lots*.   Primarily, simple web applications, where 
 queries are never supposed to take more than 50ms.  If a query turns up 
 with an estimated cost of 100, then you know something's wrong; 
 in the statistics if not in the query.  In either case, that query has a 
 good chance of dragging down the whole system.
 
 In such a production application, it is better to have false positives 
 and reject otherwise-OK queries becuase their costing is wrong, than to 
 let a single cartesian join bog down an application serving 5000 
 simultaneous users.  Further, with a SQL error, this would allow the 
 query rejection to be handled in a user-friendly way from the UI 
 (Search too complex.  Try changing search terms.) rather than timing 
 out, which is very difficult to handle well.
 
 The usefulness of this feature for interactive sessions is 
 limited-to-nonexistant.  It's for production applications.

Ok, that is a different use case where an error seems very useful. What
about slightly extending the proposal to have the severity of exceeding
the limit configurable too. Something like:

   costestimate_limit = 10 # default 0 to ignore limit
   costestimate_limit_severity = error # debug, notice, warning, error

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Kevin Grittner
 Hannu Krosing [EMAIL PROTECTED] wrote:
 
 I was once told about a company, who claimed to have produced a
 positively fool-proof lawn-mower, only to find out, that a
university
 professor had tried to use it to trim a hedge and cut off his toes.
 
Odd.  Seriously, about 45 years ago I lived next door to a university
botany professor who cut off his fingertips that way.  I wonder if
professors are more prone to this or whether the story got mangled
over time.
 
Perhaps software should have special protection for professors
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Simon Riggs

On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:

 And you'll note, I specifically said that a crude tool is better than
 nothing. But your completely ignoring that a crude tool can often
 end-up as a foot-gun once relased into the wild. 

The proposal is for an option with no consequences when turned off. We
respect your right not to use it. What is the danger exactly? 

If we cancel stupid queries before people run them, everybody is a
winner. Even the person who submitted the stupid query, since they find
out faster.

Sure, its an estimate, but it's got to be a based upon an estimate if it
acts *before* it runs. And surely there is no better estimate of the
cost than the plan cost?

It doesn't stop anyone from putting in resource limits, later.

We'll have to do something with enable_seqscan, BTW, chaps.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Josh Berkus
Greg,

 For such an application this would be a major foot-gun which would give
 a false sense of security simultaneously causing random outages and not
 providing even the protection you're counting on.

Hmmm.  That sounds like a call for some testing.  While our cost estimation 
has some issues, I don't think it's unreliable as all that.  And it's easy 
enough to document the limitations.  If it's 80% accurate, then it's 
fixing more problems than it causes.  If it's 30% accurate, then obviously 
it's a bad idea.

Personally, I don't see much use for this interactively, because an 
experienced DBA can easily enough run an EXPLAIN before running the query.  
I usually do, on production systems.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 16:49:43 Simon Riggs wrote:
 On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
  On Monday 04 August 2008 03:50:40 daveg wrote:
 
  And you'll note, I specifically said that a crude tool is better than
  nothing. But your completely ignoring that a crude tool can often
  end-up as a foot-gun once relased into the wild.

 The proposal is for an option with no consequences when turned off. We
 respect your right not to use it. What is the danger exactly?


All of the proposals I have seen for adding query hints would also have no 
consequence if not used, but no one seems to care about that argument. 

:-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 15:56:25 daveg wrote:
 On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote:
  On Monday 04 August 2008 03:50:40 daveg wrote:
 
  That's great for you, I am talking in the scope of a general solution.
  (Note I'd also bet that even given the same hardware, different
  production loads can produce different relative mappings of cost vs.
  performance, but whatever)

 Even on different hardware it would still likely warn of mistakes like
 products due to missing join conditions etc.


See, this is what we ended up talking about before. Someone will say I'd like 
to prevent my devs from accidentally doing queries with cartesian products 
and they will use this to do it... but that will only work in some cases, so 
it becomes a poor tool to solve a different problem. 

BTW, what I really love about statement costs, is that they aren't even 
reliable on the same machine with the same data. I have seen query plans 
which run on the same data on the same machine where the resultant query 
runtime can vary from 2 hours to 5 hours, depending on how much other 
concurrent traffic is on the machine. Awesome eh? 

I still think it is worth revisiting what problems people are trying
to solve, and see if there are better tools they can be given to
solve them. Barring that, I suppose a crude solution is better than
nothing, though I fear people might point at the crude solution as a
good enough solution to justify not working on better solutions.
  
   Alerting developers and QA to potentially costly queries would help
   solve some of the probems we are trying to solve. Better tools are
   welcome, an argument that the good is the enemy of the best so we
   should be content with nothing is not.
 
  And you'll note, I specifically said that a crude tool is better than
  nothing.

 I released somewhat after I sent the above that it might have sounded a bit
 snippy. I hope I have not offended.

  But your completely ignoring that a crude tool can often end-up as a
  foot-gun once relased into the wild.

 I'm suggesting a warning, or even just a notice into the logs, I don't see
 the footgun. What am I missing?


The footgun in my mind is that people will think this solves a number of 
problems even though it doesnt solve them well.  However, the footgun for you 
might be that the current proposal will actually abort the query, not emit a 
warning (not sure if that changes your opinion of it).  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 05:19:50PM -0400, Robert Treat wrote:
 See, this is what we ended up talking about before. Someone will say I'd 
 like 
 to prevent my devs from accidentally doing queries with cartesian products 
 and they will use this to do it... but that will only work in some cases, so 
 it becomes a poor tool to solve a different problem. 
 
 BTW, what I really love about statement costs, is that they aren't even 
 reliable on the same machine with the same data. I have seen query plans 
 which run on the same data on the same machine where the resultant query 
 runtime can vary from 2 hours to 5 hours, depending on how much other 
 concurrent traffic is on the machine. Awesome eh? 

Sure, I don't think anyone believes that costs are precise. But the case that
is interesting is 2 hours versus years and years.

 The footgun in my mind is that people will think this solves a number of 
 problems even though it doesnt solve them well.  However, the footgun for yo

I suspect that a good solution to this problem is impossible as it is more
or less the halting problem. So I'm willing to accept a poor solution based
on costs and then hope we improve the cost model.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread daveg
On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote:
 On Aug 2, 2008, at 8:38 PM, Tom Lane wrote:
 
 Andrew Dunstan [EMAIL PROTECTED] writes:
 Hans-Jürgen Schönig wrote:
 i introduced a GUC called statement_cost_limit which can be used to
 error out if a statement is expected to be too expensive.
 
 You clearly have far more faith in the cost estimates than I do.
 
 Wasn't this exact proposal discussed and rejected awhile back?
 
  regards, tom lane
 
 
 
 i don't remember precisely.
 i have seen it on simon's wiki page and it is something which would  
 have been useful in some cases in the past.

I think a variation on this could be very useful in development and test
environments. Suppose it raised a warning or notice if the cost was over
the limit. Then one could set a limit of a few million on the development
and test servers and developers would at least have a clue that they needed
to look at explain for that query. As it is now, one can exhort them to
run explain, but it has no effect.  Instead we later see queries killed
by a 24 hour timeout with estimated costs ranging from until they unplug
the machine and dump it to until the sun turns into a red giant.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Simon Riggs

On Sun, 2008-08-03 at 00:44 -0700, daveg wrote:
 On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote:
  On Aug 2, 2008, at 8:38 PM, Tom Lane wrote:
  
  Andrew Dunstan [EMAIL PROTECTED] writes:
  Hans-Jürgen Schönig wrote:
  i introduced a GUC called statement_cost_limit which can be used to
  error out if a statement is expected to be too expensive.
  
  You clearly have far more faith in the cost estimates than I do.
  
  Wasn't this exact proposal discussed and rejected awhile back?
  
  i don't remember precisely.
  i have seen it on simon's wiki page and it is something which would  
  have been useful in some cases in the past.

I still support it. Regrettably, many SQL developers introduce product
joins and other unintentional errors. Why let problem queries through?
Security-wise they're great Denial of Service attacks, bringing the
server to its knees better than most ways I know, in conjunction with a
nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O
and diskspace resources used all in a simple killer query.

If anybody thinks costs are inaccurate, don't use it. Or better still
improve the cost models. It isn't any harder or easier to find a useful
value than it is to use statement_timeout. What's the difference between
picking an arbitrary time and an arbitrary cost? You need to alter the
value according to people's complaints in both cases.

 I think a variation on this could be very useful in development and test
 environments. Suppose it raised a warning or notice if the cost was over
 the limit. Then one could set a limit of a few million on the development
 and test servers and developers would at least have a clue that they needed
 to look at explain for that query. As it is now, one can exhort them to
 run explain, but it has no effect.  Instead we later see queries killed
 by a 24 hour timeout with estimated costs ranging from until they unplug
 the machine and dump it to until the sun turns into a red giant.

Great argument. So that's 4 in favour at least.

A compromise would be to have log_min_statement_cost (or
warn_min_statement_cost) which will at least help find these problems in
testing before we put things live, but that still won't help with
production issues.

Another alternative would be to have a plugin that can examine the plan
immediately after planner executes, so you can implement this yourself,
plus some other possibilities.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Hans-Jürgen Schönig

hello ...




I still support it. Regrettably, many SQL developers introduce product
joins and other unintentional errors. Why let problem queries through?



i think the killer is that we don't have to wait until the query dies  
with a statement_timeout.
it is ways more elegant to kill things before they have already eaten  
too many cycles.
one thing which is important as well: statement_cost_limit  does not  
kill queries which have just been waiting for a lock.

this makes things slightly more predictable.



Security-wise they're great Denial of Service attacks, bringing the
server to its knees better than most ways I know, in conjunction  
with a

nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O
and diskspace resources used all in a simple killer query.




i am not too concerned about DNS, i have to admit.
i would rather see it as a way to make developers do better things.



If anybody thinks costs are inaccurate, don't use it. Or better still
improve the cost models. It isn't any harder or easier to find a  
useful
value than it is to use statement_timeout. What's the difference  
between

picking an arbitrary time and an arbitrary cost? You need to alter the
value according to people's complaints in both cases.



the cost model is good enough to see if something is good or bad.
this is basically all we want to do here --- killing all evil.



*snip*






A compromise would be to have log_min_statement_cost (or
warn_min_statement_cost) which will at least help find these  
problems in

testing before we put things live, but that still won't help with
production issues.




definitely. a good idea as well - but people will hardly read it, i  
guess :(.



Another alternative would be to have a plugin that can examine the  
plan
immediately after planner executes, so you can implement this  
yourself,

plus some other possibilities.




this would be really fancy.
how could a plugin like that look like?

hans



--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Simon Riggs

On Sun, 2008-08-03 at 22:09 +0200, Hans-Jürgen Schönig wrote:

  Another alternative would be to have a plugin that can examine the  
  plan
  immediately after planner executes, so you can implement this  
  yourself,
  plus some other possibilities.
 

 this would be really fancy.
 how could a plugin like that look like?

Hmm...thinks: exactly like the existing planner_hook().

So, rewrite this as a planner hook and submit as a contrib module.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Josh Berkus
Tom,

 Wasn't this exact proposal discussed and rejected awhile back?

We rejected Greenplum's much more invasive resource manager, because it 
created a large performance penalty on small queries whether or not it was 
turned on.  However, I don't remember any rejection of an idea as simple 
as a cost limit rejection.

This would, IMHO, be very useful for production instances of PostgreSQL.  
The penalty for mis-rejection of a poorly costed query is much lower than 
the penalty for having a bad query eat all your CPU.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Robert Treat
On Sunday 03 August 2008 15:12:22 Simon Riggs wrote:
 On Sun, 2008-08-03 at 00:44 -0700, daveg wrote:
  On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote:
   On Aug 2, 2008, at 8:38 PM, Tom Lane wrote:
   Andrew Dunstan [EMAIL PROTECTED] writes:
   Hans-Jürgen Schönig wrote:
   i introduced a GUC called statement_cost_limit which can be used to
   error out if a statement is expected to be too expensive.
   
   You clearly have far more faith in the cost estimates than I do.
   
   Wasn't this exact proposal discussed and rejected awhile back?
  
   i don't remember precisely.
   i have seen it on simon's wiki page and it is something which would
   have been useful in some cases in the past.

 I still support it. Regrettably, many SQL developers introduce product
 joins and other unintentional errors. Why let problem queries through?
 Security-wise they're great Denial of Service attacks, bringing the
 server to its knees better than most ways I know, in conjunction with a
 nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O
 and diskspace resources used all in a simple killer query.


ISTR that what ended up killing the enthusiasm for this was that most people 
realized that this GUC was just a poor tool to take a stab at solving other 
problems (ie. rate limiting cpu for queries). 

 If anybody thinks costs are inaccurate, don't use it. Or better still
 improve the cost models. It isn't any harder or easier to find a useful
 value than it is to use statement_timeout. What's the difference between
 picking an arbitrary time and an arbitrary cost? You need to alter the
 value according to people's complaints in both cases.


I think the original argument for statement_timeout was that long running 
queries were known to cause have wrt vacuum strategies (remember, that one 
has been in the back end a long time). ISTR some recent threds on -hackers 
questioning whether statement_timeout should be eliminated itself.  

  I think a variation on this could be very useful in development and test
  environments. Suppose it raised a warning or notice if the cost was over
  the limit. Then one could set a limit of a few million on the development
  and test servers and developers would at least have a clue that they
  needed to look at explain for that query. As it is now, one can exhort
  them to run explain, but it has no effect.  Instead we later see queries
  killed by a 24 hour timeout with estimated costs ranging from until they
  unplug the machine and dump it to until the sun turns into a red
  giant.

 Great argument. So that's 4 in favour at least.


Not such a great argument. Cost models on development servers can and often 
are quite different from those on production, so you might be putting an 
artifical limit on top of your developers. 

 A compromise would be to have log_min_statement_cost (or
 warn_min_statement_cost) which will at least help find these problems in
 testing before we put things live, but that still won't help with
 production issues.

 Another alternative would be to have a plugin that can examine the plan
 immediately after planner executes, so you can implement this yourself,
 plus some other possibilities.


I still think it is worth revisiting what problems people are trying to solve, 
and see if there are better tools they can be given to solve them.  Barring 
that, I suppose a crude solution is better than nothing, though I fear people 
might point at the crude solution as a good enough solution to justify not 
working on better solutions. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-03 Thread Mark Kirkwood

Josh Berkus wrote:

Tom,

  

Wasn't this exact proposal discussed and rejected awhile back?



We rejected Greenplum's much more invasive resource manager, because it 
created a large performance penalty on small queries whether or not it was 
turned on.  However, I don't remember any rejection of an idea as simple 
as a cost limit rejection.


This would, IMHO, be very useful for production instances of PostgreSQL.  
The penalty for mis-rejection of a poorly costed query is much lower than 
the penalty for having a bad query eat all your CPU.


  
Greenplum's introduced a way to creating a cost threshold a bit like 
the way Simon was going to do shared work_mem. It did 2 things:


1/ Counted the cost of an about-to-be run query against the threshold, 
and made the query wait if it would exhaust it

2/ Aborted the query if its  cost was greater than the threshold

Initially there was quite a noticeable performance penalty with it 
enabled - but as the guy working on it (me) redid bits and pieces then 
penalty decreased massively. Note that in all cases, disabling the 
feature meant there was no penalty.


The latest variant of the code is around in the Bizgres repository 
(src/backend/utils/resscheduler I think) - some bits might be worth 
looking at!


Best wishes

Mark

P.s : I'm not working for Greenplum now.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Hans-Jürgen Schönig

hello ...

i picked up csaba nagy's idea and implemented a very simple yet very  
useful extension.
i introduced a GUC called statement_cost_limit which can be used to  
error out if a statement is expected to be too expensive.
the advantage over statement_timeout is that we are actually able to  
error out before spending many seconds which is killed by  
statement_timeout anyway.


best regards,

hans




statement_cost_limit1.patch
Description: Binary data



--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Andrew Dunstan



Hans-Jürgen Schönig wrote:

hello ...

i picked up csaba nagy's idea and implemented a very simple yet very 
useful extension.
i introduced a GUC called statement_cost_limit which can be used to 
error out if a statement is expected to be too expensive.
the advantage over statement_timeout is that we are actually able to 
error out before spending many seconds which is killed by 
statement_timeout anyway.


 
  


You clearly have far more faith in the cost estimates than I do.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Hans-Jürgen Schönig wrote:
 i introduced a GUC called statement_cost_limit which can be used to 
 error out if a statement is expected to be too expensive.

 You clearly have far more faith in the cost estimates than I do.

Wasn't this exact proposal discussed and rejected awhile back?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-02 Thread Hans-Jürgen Schönig

On Aug 2, 2008, at 8:38 PM, Tom Lane wrote:


Andrew Dunstan [EMAIL PROTECTED] writes:

Hans-Jürgen Schönig wrote:

i introduced a GUC called statement_cost_limit which can be used to
error out if a statement is expected to be too expensive.



You clearly have far more faith in the cost estimates than I do.


Wasn't this exact proposal discussed and rejected awhile back?

regards, tom lane




i don't remember precisely.
i have seen it on simon's wiki page and it is something which would  
have been useful in some cases in the past.


many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers