Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Simon Riggs

On Sun, 2008-08-03 at 22:57 -0400, Robert Treat wrote:

 I still think it is worth revisiting what problems people are trying
 to solve, and see if there are better tools they can be given to solve
 them.  Barring that, I suppose a crude solution is better than
 nothing, though I fear people might point at the crude solution as a
 good enough solution to justify not working on better solutions. 

I advocate solutions to the problems of users I've worked with.

My preference is to help people in the next release, then improve from
there. We need to work with what we have.

In this case, an existing solution has been found.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Initial Unsigned Integer data type performance test results.

2008-08-04 Thread Ryan Bradetich
Hello All,

I wanted to pass some performance data on to the group regarding the
unsigned integer
data types I am working on.  I tested on two systems running Ubuntu
Hardy.   The first system
is an 8 x 2.66GHz x86-64 processor system.  The second system is a 2 x
533 celeron i386
system.  For this test I disabled autovaccuum and manually ran the
vacuum during times that
were not timed.  I also added some sleep and sync commands to try and
stabalize the test
results.  The integer and unsigned integer test results were also
interleaved to compensate for
other system activities.   Finally, I have not done a lot of bench
marking, so feedback and
improvement ideas are welcome!

I benchmarked four tests:
32-bit int2 vs uint2--  My expectation is these results
would be approximately
  -- equal.  The uint2 casts
up to an int4 which is still 32-bits.

32-bit int4 vs uint4-- My expectation is  operator
should be approximately
  -- equal since I explicitly
added int4  uint4 operators.

64-bit int2 vs uint2 -- My expectation is these results
would be approximately
   -- equal.  The uint2 casts
up to an int4 which is less than
   -- the 64-bit processor bus width.

64-bit int4 vs uint4 -- My expectation is these results
would be approximately
   -- equal.  The uint4 casts
up to an int8 which is the processor
   -- bus width.

Here is the definition of each column (times are displayed in seconds):

Rows-- The number of rows added to the table.
Test  -- The test number (I ran each benchmark 3 times).
S Load -- Time to load the table with signed integers.
U Load -- Time to load the table with unsigned integers.
S  OP  -- Time to perform an  operator on each row in the
table (signed integers)
U  OP  -- Time to perform an  operator on each row in the
table (unsigned integers)
S  OP -- Time to perform an AND operator on each row in
the table (signed integers)
U  OP -- Time to perform an AND operator on each row in
the table (unsigned integers)


I still have some more investigating to do into the results of this
data (i.e. why is the signed  OP cheaper for int2 than uint2 types,
etc),
but I wanted to share this performance data with the community.  Rough
calculations show about a 4% performance hit for 32-bit processors
to use the int8 type and a 6% performance hit for 64-bit processors.
This seems like a reasonable trade-off for the reduced storage space
I am after, and the reduction in operators Greg and Tom were concerned
about.  If there are no objections from the community, I will plan to
complete my implementation of Tom's proposal at:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01204.php.

Thanks!

- Ryan

P.S. The code I have so far is not ready for review.  It was
implemented quickly to get performance numbers to determine if Tom's
proposal
would be acceptable by the community.  I can send it out if people are
interested, but I am planning on submitting it to a future commit fest
once I have it cleaned up and better tested.



Benchmark Data:
==



  32-bit int2 vs. uint2

  Rows  Test  S Load  U Load  S  OP  U  OP  S
 OP  U  OP
--
 100 1  000.002888  000.002151  000.010881  000.014691
000.011124  000.011000
 100 2  000.002780  000.002127  000.011729  000.011611
000.012014  000.011925
 100 3  000.002747  000.002085  000.010193  000.010318
000.010588  000.010576

1000 1  000.003201  000.003870  000.037837  000.037360
000.032064  000.032478
1000 2  000.003259  000.003912  000.033495  000.036281
000.032502  000.035195
1000 3  000.003201  000.003913  000.039156  000.035592
000.032405  000.040543

   1 1  000.024683  000.021306  000.255958  000.329045
000.255887  000.283782
   1 2  000.020214  000.021224  000.260252  000.290933
000.281468  000.255171
   1 3  000.020371  000.020940  000.276401  000.264791
000.257598  000.257258

  10 1  001.669571  001.687523  002.591442  002.682428
003.410724  003.490362
  10 2  001.682251  001.702598  003.379377  002.855622
002.549476  002.583431
  10 3  001.693429  001.684732  002.546024  002.641240
002.540556  003.366534

  50 1  010.138317  011.014532  015.707597  015.61
015.394598  015.502639
  50 2  010.042176  010.179163  015.290994  015.407479
015.332925  016.321578
  50 3  010.047930  010.206489  015.016276  015.430527
015.201759  015.411601

 100 1  020.762680  022.145950  030.338606  

Re: [HACKERS] unnecessary code in_bt_split

2008-08-04 Thread Simon Riggs

On Sun, 2008-08-03 at 19:44 -0400, Tom Lane wrote:
 Zdenek Kotala [EMAIL PROTECTED] writes:
  I found that _bt_split function calls PageGetTempPage, but next call is 
  _bt_page_init which clear all contents anyway. Is there any reason to call 
  PageGetTempPage instead of palloc?
 
 Not violating a perfectly good abstraction?
 
 I agree that PageGetTempPage isn't amazingly efficient, but internal
 refactoring would halve its cost; and if you have some evidence that
 there's a real performance issue then we could think about adjusting
 the temp-page API to allow _bt_pageinit to be combined with it.  But
 I have a real problem with hacking up _bt_split so that it will call
 PageRestoreTempPage on something it didn't get from PageGetTempPage.
 
 Considering the WAL and regular I/O that will be induced by a split,
 I kinda doubt this is even worth worrying about anyway...

Improving this should help, since the existing page is write locked
during _bt_split. The I/O won't happen at the point that these blocks
are critical contention points.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
 
 ISTR that what ended up killing the enthusiasm for this was that most people 
 realized that this GUC was just a poor tool to take a stab at solving other 
 problems (ie. rate limiting cpu for queries). 

I'm not concerned with that, I want developers to have feed back on costs in
a way that is obvious.
 
   I think a variation on this could be very useful in development and test
   environments. Suppose it raised a warning or notice if the cost was over
   the limit. Then one could set a limit of a few million on the development
   and test servers and developers would at least have a clue that they
   needed to look at explain for that query. As it is now, one can exhort
   them to run explain, but it has no effect.  Instead we later see queries
   killed by a 24 hour timeout with estimated costs ranging from until they
   unplug the machine and dump it to until the sun turns into a red
   giant.
 
  Great argument. So that's 4 in favour at least.
 
 
 Not such a great argument. Cost models on development servers can and often 
 are quite different from those on production, so you might be putting an 
 artifical limit on top of your developers. 

We load the production dumps into our dev environment, which are the same
hardware spec, so the costs should be identical.
 
 I still think it is worth revisiting what problems people are trying to 
 solve, 
 and see if there are better tools they can be given to solve them.  Barring 
 that, I suppose a crude solution is better than nothing, though I fear people 
 might point at the crude solution as a good enough solution to justify not 
 working on better solutions. 

Alerting developers and QA to potentially costly queries would help solve
some of the probems we are trying to solve. Better tools are welcome, an
argument that the good is the enemy of the best so we should be content with
nothing is not.

-dg
 

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes:

 Tom,

 Wasn't this exact proposal discussed and rejected awhile back?

 We rejected Greenplum's much more invasive resource manager, because it 
 created a large performance penalty on small queries whether or not it was 
 turned on.  However, I don't remember any rejection of an idea as simple 
 as a cost limit rejection.

The idea's certainly come up before. It probably received the usual
non-committal cold shoulder rather than an outright rejection.

 This would, IMHO, be very useful for production instances of PostgreSQL.  
 The penalty for mis-rejection of a poorly costed query is much lower than 
 the penalty for having a bad query eat all your CPU.

Well that's going to depend on the application But I suppose there's
nothing wrong with having options which aren't always a good idea to use. The
real question I guess is whether there's ever a situation where it would be a
good idea to use this. I'm not 100% sure.

What I would probably use myself is an option to print a warning before
starting the query. That would be handy for interactive sessions so you would
be able to hit C-c instead of waiting for several minutes and then wondering
whether you got the query wrong.

I wonder if it would be useful to have a flag on some GUC options to make them
not globally settable. That is, for example, you could set enable_seqscan in
an individual session but not in postgres.conf. Or perhaps again just print a
warning that it's not recommended as a global configuration.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnecessary code in_bt_split

2008-08-04 Thread Zdenek Kotala

Tom Lane napsal(a):

Zdenek Kotala [EMAIL PROTECTED] writes:
I found that _bt_split function calls PageGetTempPage, but next call is 
_bt_page_init which clear all contents anyway. Is there any reason to call 
PageGetTempPage instead of palloc?


Not violating a perfectly good abstraction?


OK. Abstraction is nice, but what I see in the PageGetTempPage It is more like 
code which makes everything but usability is zero. It is used only in two places 
and in both it is used for different purpose. _bt_split() needs only allocate 
empty temp page and gistplacetopage() .



 By my opinion It would be better to have three functions:

PageCreateTempPage - only allocate memory and call pageinit
PageCloneSpecial - copy special section from source page
PageRestoreTempPage - no change.


Zdenek



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for pgstat.stat

2008-08-04 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 It doesn't seem to me that it'd be hard to support two locations for the
 stats file --- it'd just take another parameter to the read and write
 routines.  pgstat.c already knows the difference between a normal write
 and a shutdown write ...
 
 Right. Should it be removed from the permanent location when the server
 starts?
 
 Yes, I would say so.  There are two possible exit paths: normal shutdown
 (where we'd write a new file) and crash.  In a crash we'd wish to delete
 the file anyway for fear that it's corrupted.
 
   Startup: read permanent file, then delete it.
 
   Post-crash: remove any permanent file (same as now)
 
   Shutdown: write permanent file.
 
   Normal stats collector write: write temp file.
 
   Backend stats fetch: read temp file.

Attached is a patch that implements this. I went with the option of just
storing it in a temporary directory that can be symlinked, and not
bothering with a GUC for it. Comments? (documentation updates are also
needed, but I'll wait with those until I hear patch comments :-P)


//Magnus
Index: backend/postmaster/pgstat.c
===
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.176
diff -c -r1.176 pgstat.c
*** backend/postmaster/pgstat.c	30 Jun 2008 10:58:47 -	1.176
--- backend/postmaster/pgstat.c	4 Aug 2008 09:39:23 -
***
*** 67,74 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_FILENAME	global/pgstat.stat
! #define PGSTAT_STAT_TMPFILE		global/pgstat.tmp
  
  /* --
   * Timer definitions.
--- 67,76 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
! #define PGSTAT_STAT_FILENAMEpgstat_tmp/pgstat.stat
! #define PGSTAT_STAT_TMPFILE	pgstat_tmp/pgstat.tmp
  
  /* --
   * Timer definitions.
***
*** 218,225 
  static void pgstat_beshutdown_hook(int code, Datum arg);
  
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
! static void pgstat_write_statsfile(void);
! static HTAB *pgstat_read_statsfile(Oid onlydb);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
--- 220,227 
  static void pgstat_beshutdown_hook(int code, Datum arg);
  
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
***
*** 509,514 
--- 511,517 
  pgstat_reset_all(void)
  {
  	unlink(PGSTAT_STAT_FILENAME);
+ 	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  }
  
  #ifdef EXEC_BACKEND
***
*** 2595,2601 
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid);
  
  	/*
  	 * Setup the descriptor set for select(2).	Since only one bit in the set
--- 2598,2604 
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
  
  	/*
  	 * Setup the descriptor set for select(2).	Since only one bit in the set
***
*** 2635,2641 
  			if (!PostmasterIsAlive(true))
  break;
  
! 			pgstat_write_statsfile();
  			need_statwrite = false;
  			need_timer = true;
  		}
--- 2638,2644 
  			if (!PostmasterIsAlive(true))
  break;
  
! 			pgstat_write_statsfile(false);
  			need_statwrite = false;
  			need_timer = true;
  		}
***
*** 2803,2809 
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile();
  
  	exit(0);
  }
--- 2806,2812 
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile(true);
  
  	exit(0);
  }
***
*** 2891,2897 
   * --
   */
  static void
! pgstat_write_statsfile(void)
  {
  	HASH_SEQ_STATUS hstat;
  	HASH_SEQ_STATUS tstat;
--- 2894,2900 
   * --
   */
  static void
! pgstat_write_statsfile(bool permanent)
  {
  	HASH_SEQ_STATUS hstat;
  	HASH_SEQ_STATUS tstat;
***
*** 2901,2917 
  	PgStat_StatFuncEntry *funcentry;
  	FILE	   *fpout;
  	int32		format_id;
  
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
! 	fpout = fopen(PGSTAT_STAT_TMPFILE, PG_BINARY_W);
  	if (fpout == NULL)
  	{
  		ereport(LOG,
  (errcode_for_file_access(),
   errmsg(could not open temporary statistics file \%s\: %m,
! 		PGSTAT_STAT_TMPFILE)));
  		return;
  	}
  
--- 2904,2922 
  	PgStat_StatFuncEntry *funcentry;
  	FILE	   *fpout;
  	int32		format_id;
+ 	const char 

[HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Gregory Stark

It seems there's something wrong with CheckOtherDBBackends() but I haven't
exactly figured out what. There are no other sessions but drop database keeps
saying regression is being accessed by other users. I do see Autovacuum
touching tables in regression but CheckOtherDBBackends() is supposed to send
it a sigkill if it finds it and it doesn't seem to be doing so.

I've been hacking on unrelated stuff in this database and have caused multiple
core dumps and autovacuum is finding orphaned temp tables. It's possible some
state is corrupted in some way here but I don't see what.


postgres=# select * from pg_stat_activity; 
 datid | datname  | procpid | usesysid | usename |  current_query   
   | waiting |  xact_start   |  query_start  |  
   backend_start | client_addr | client_port 
---+--+-+--+-+-+-+---+---+---+-+-
 11505 | postgres |5616 |   10 | stark   | select * from 
pg_stat_activity; | f   | 2008-08-04 11:46:05.438479+01 | 2008-08-04 
11:46:05.438956+01 | 2008-08-04 11:45:19.827702+01 | |  -1
(1 row)

postgres=# commit;
COMMIT

postgres=# drop database regression;
ERROR:  55006: database regression is being accessed by other users
LOCATION:  dropdb, dbcommands.c:678


select * from pg_stat_activity; 
postgres=#  datid | datname  | procpid | usesysid | usename |  
current_query  | waiting |  xact_start   |  
query_start  | backend_start | client_addr | 
client_port 
---+--+-+--+-+-+-+---+---+---+-+-
 11505 | postgres |5616 |   10 | stark   | select * from 
pg_stat_activity; | f   | 2008-08-04 11:46:45.619642+01 | 2008-08-04 
11:46:45.620115+01 | 2008-08-04 11:45:19.827702+01 | |  -1
(1 row)


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Alvaro Herrera
Gregory Stark wrote:

 It seems there's something wrong with CheckOtherDBBackends() but I haven't
 exactly figured out what. There are no other sessions but drop database keeps
 saying regression is being accessed by other users. I do see Autovacuum
 touching tables in regression but CheckOtherDBBackends() is supposed to send
 it a sigkill if it finds it and it doesn't seem to be doing so.
 
 I've been hacking on unrelated stuff in this database and have caused multiple
 core dumps and autovacuum is finding orphaned temp tables. It's possible some
 state is corrupted in some way here but I don't see what.

The buildfarm would be all red if this wasn't something local to your
installation, I think.  Maybe you should get gdb on the backend and set
a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to
see why it isn't working.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_regress inputdir

2008-08-04 Thread Jorgen Austvik - Sun Norway

Alvaro Herrera wrote:

In my opinion, the need
for running tests outside the test dir is not very strong (or we would
have heard complaints before), and thus the solution is to remove
--inputdir and --outputdir.


Attached is a patch that removes --inputdir and --outputdir. I still 
prefere the first patch (that fixed my problem), but removing them is 
probably better than having them when they don't work.


Tested with psql make check on solaris x86.

-J
--

Jørgen Austvik, Software Engineering - QA
Sun Microsystems Database Group
Index: src/test/regress/pg_regress_main.c
===
RCS file: /projects/cvsroot/pgsql/src/test/regress/pg_regress_main.c,v
retrieving revision 1.3
diff -u -r1.3 pg_regress_main.c
--- src/test/regress/pg_regress_main.c	1 Jan 2008 19:46:00 -	1.3
+++ src/test/regress/pg_regress_main.c	4 Aug 2008 11:19:04 -
@@ -34,12 +34,9 @@
 	char		expectfile[MAXPGPATH];
 	char		psql_cmd[MAXPGPATH * 3];
 
-	snprintf(infile, sizeof(infile), %s/sql/%s.sql,
-			 inputdir, testname);
-	snprintf(outfile, sizeof(outfile), %s/results/%s.out,
-			 outputdir, testname);
-	snprintf(expectfile, sizeof(expectfile), %s/expected/%s.out,
-			 inputdir, testname);
+	snprintf(infile, sizeof(infile), sql/%s.sql, testname);
+	snprintf(outfile, sizeof(outfile), results/%s.out, testname);
+	snprintf(expectfile, sizeof(expectfile), expected/%s.out, testname);
 
 	add_stringlist_item(resultfiles, outfile);
 	add_stringlist_item(expectfiles, expectfile);
Index: src/test/regress/pg_regress.c
===
RCS file: /projects/cvsroot/pgsql/src/test/regress/pg_regress.c,v
retrieving revision 1.46
diff -u -r1.46 pg_regress.c
--- src/test/regress/pg_regress.c	3 Aug 2008 05:12:38 -	1.46
+++ src/test/regress/pg_regress.c	4 Aug 2008 11:19:04 -
@@ -32,6 +32,9 @@
 #include getopt_long.h
 #include pg_config_paths.h
 
+#define LOG_DIRECTORY log
+#define RESULTS_DIRECTORY results
+
 /* for resultmap we need a list of pairs of strings */
 typedef struct _resultmap
 {
@@ -68,8 +71,6 @@
 /* options settable from command line */
 _stringlist *dblist = NULL;
 bool		debug = false;
-char	   *inputdir = .;
-char	   *outputdir = .;
 char	   *psqldir = NULL;
 static _stringlist *loadlanguage = NULL;
 static int	max_connections = 0;
@@ -560,8 +561,7 @@
 	FILE	   *f;
 
 	/* scan the file ... */
-	snprintf(buf, sizeof(buf), %s/resultmap, inputdir);
-	f = fopen(buf, r);
+	f = fopen(resultmap, r);
 	if (!f)
 	{
 		/* OK if it doesn't exist, else complain */
@@ -1702,8 +1702,7 @@
 	FILE	   *difffile;
 
 	/* create the log file (copy of running status output) */
-	snprintf(file, sizeof(file), %s/regression.out, outputdir);
-	logfilename = strdup(file);
+	logfilename = regression.out;
 	logfile = fopen(logfilename, w);
 	if (!logfile)
 	{
@@ -1713,8 +1712,7 @@
 	}
 
 	/* create the diffs file as empty */
-	snprintf(file, sizeof(file), %s/regression.diffs, outputdir);
-	difffilename = strdup(file);
+	difffilename = regression.diffs;
 	difffile = fopen(difffilename, w);
 	if (!difffile)
 	{
@@ -1726,9 +1724,8 @@
 	fclose(difffile);
 
 	/* also create the output directory if not present */
-	snprintf(file, sizeof(file), %s/results, outputdir);
-	if (!directory_exists(file))
-		make_directory(file);
+	if (!directory_exists(RESULTS_DIRECTORY))
+		make_directory(RESULTS_DIRECTORY);
 }
 
 static void
@@ -1799,14 +1796,12 @@
 	printf(_(Options:\n));
 	printf(_(  --dbname=DB   use database DB (default \regression\)\n));
 	printf(_(  --debug   turn on debug mode in programs that are run\n));
-	printf(_(  --inputdir=DIRtake input files from DIR (default \.\)\n));
 	printf(_(  --load-language=lang  load the named language before running the\n));
 	printf(_(tests; can appear multiple times\n));
 	printf(_(  --create-role=ROLEcreate the specified role before testing\n));
 	printf(_(  --max-connections=N   maximum number of concurrent connections\n));
 	printf(_((default is 0 meaning unlimited)\n));
 	printf(_(  --multibyte=ENCODING  use ENCODING as the multibyte encoding\n));
-	printf(_(  --outputdir=DIR   place output files in DIR (default \.\)\n));
 	printf(_(  --schedule=FILE   use test ordering schedule from FILE\n));
 	printf(_((can be used multiple times to concatenate)\n));
 	printf(_(  --srcdir=DIR  absolute path to source directory (for VPATH builds)\n));
@@ -1844,11 +1839,9 @@
 		{version, no_argument, NULL, 'V'},
 		{dbname, required_argument, NULL, 1},
 		{debug, no_argument, NULL, 2},
-		{inputdir, required_argument, NULL, 3},
 		{load-language, required_argument, NULL, 4},
 		{max-connections, required_argument, NULL, 5},
 		{multibyte, required_argument, NULL, 6},
-		{outputdir, required_argument, NULL, 7},
 		{schedule, required_argument, NULL, 8},
 		

Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Michael Fuhr
On Mon, Aug 04, 2008 at 11:51:35AM +0100, Gregory Stark wrote:
 It seems there's something wrong with CheckOtherDBBackends() but I haven't
 exactly figured out what. There are no other sessions but drop database keeps
 saying regression is being accessed by other users.

Are any prepared transactions still open?

select * from pg_prepared_xacts;

-- 
Michael Fuhr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unnecessary code in_bt_split

2008-08-04 Thread Tom Lane
Zdenek Kotala [EMAIL PROTECTED] writes:
 Tom Lane napsal(a):
 Not violating a perfectly good abstraction?

   By my opinion It would be better to have three functions:

 PageCreateTempPage - only allocate memory and call pageinit
 PageCloneSpecial - copy special section from source page
 PageRestoreTempPage - no change.

That naming still breaks the association of TempPage functions.
If we're going to have multiple temp-page-creation functions,
I think their names should follow a pattern like PageGetTempPageXXX.

After looking around a bit, I'm not entirely convinced that there's
*any* call for the existing definition of PageGetTempPage :-(.
There are only two callers: _bt_split() which certainly doesn't need
it to work the way it does, and gistplacetopage() which might or
might not be just as happy initializing all of the page special
space for itself.  Oleg, Teodor, could you comment on whether it's
really needed to copy the old page's special space there?

Also, to the extent that PageGetTempPage copies the source page's
header instead of setting it up from scratch, I think it's outright
*wrong*.  This will result in copying the source's pd_flags and
pd_prune_xid, neither of which seems like correct behavior given that
we're clearing the page contents.

I'm thinking we should split PageGetTempPage into two versions:

PageGetTempPage: get a temp page the same size as the given page,
but don't initialize its contents at all (so, just a thin wrapper
for palloc).  This could be used by _bt_split, as well as
GinPageGetCopyPage and GistPageGetCopyPage.

PageGetTempPageCopySpecial: get a temp page, PageInit it, and
copy the special space from the given page.  The only customer
for this is gistplacetopage(), so maybe we don't even want it,
rather than just doing the work right in gistplacetopage()?

You could also make an argument for PageGetTempPageCopy() which'd just
copy the source page verbatim, thus replacing GinPageGetCopyPage and
GistPageGetCopyPage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-08-04 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 (I think it's better to reuse the same postmaster executable, because
 that way it's easier to have the same parsing routines.)

 Change that to pg_ctl and you have a deal :)

Did you not understand Alvaro's point?  Putting this functionality into
pg_ctl will result in huge code bloat, because it will have to duplicate
a lot of code that already exists inside the postgres executable.

I don't object to having a pg_ctl option to call postgres --check,
but I do object to maintaining two copies of the same code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Gregory Stark
Michael Fuhr [EMAIL PROTECTED] writes:

 On Mon, Aug 04, 2008 at 11:51:35AM +0100, Gregory Stark wrote:
 It seems there's something wrong with CheckOtherDBBackends() but I haven't
 exactly figured out what. There are no other sessions but drop database keeps
 saying regression is being accessed by other users.

 Are any prepared transactions still open?

Uh, yes, I did notice that but didn't put two and two together. That does make
sense now that you mention it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 The buildfarm would be all red if this wasn't something local to your
 installation, I think.  Maybe you should get gdb on the backend and set
 a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to
 see why it isn't working.

Michael Fuhr solved it so this is academic but, the buildfarm runs make
installcheck? I thought it just ran make check

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-08-04 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Seems a lot better to me to just train people to run the check-config
 code by hand before pulling the trigger to load the settings for real.

 I think it'd be reasonable to refuse starting if the config is *known
 broken* (such as containing lines that are unparseable, or that contain
 completely invalid tokens), whereas you'd start if they just contain
 things that are probably wrong. But picking from your previous
 examples of more advanced checks,  there are lots of cases where
 things like overlapping CIDR address ranges are perfectly valid, so I
 don't think we could even throw a warning for that - unless there's a
 separate flag to enable/disable warnings for such a thing.

There are cases that are sane, and there are cases that are not.
You've got three possibilities:

* two lines referencing the exact same address range (and other
selectors such as user/database).  Definitely a mistake, because
the second one is unreachable.

* two lines where the second's address range is a subset of the
first (and other stuff is the same).  Likewise a mistake.

* two lines where the first's address range is a subset of the
second's.  This one is the only sane one.

(The nature of CIDR notation is that there are no partial overlaps,
so it must be one of these three cases.)

We have in fact seen complaints from people who apparently missed
the fact that pg_hba.conf entries are order-sensitive, so I think
a test like this would be worth making.  But it shouldn't be done
by the postmaster.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Alvaro Herrera
Gregory Stark wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 
  The buildfarm would be all red if this wasn't something local to your
  installation, I think.  Maybe you should get gdb on the backend and set
  a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to
  see why it isn't working.
 
 Michael Fuhr solved it so this is academic but, the buildfarm runs make
 installcheck? I thought it just ran make check

Hmm, I kinda assumed that it ran drop database regression at some
point, but maybe you are right that it doesn't ...

I do run make installcheck all the time though, so at least I would have
noticed ;-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Andrew Dunstan



Gregory Stark wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

  

The buildfarm would be all red if this wasn't something local to your
installation, I think.  Maybe you should get gdb on the backend and set
a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to
see why it isn't working.



Michael Fuhr solved it so this is academic but, the buildfarm runs make
installcheck? I thought it just ran make check

  


It runs both.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Location for pgstat.stat

2008-08-04 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Attached is a patch that implements this. I went with the option of just
 storing it in a temporary directory that can be symlinked, and not
 bothering with a GUC for it. Comments? (documentation updates are also
 needed, but I'll wait with those until I hear patch comments :-P)

Looks alright in a fast once-over (I didn't test it).  Two comments:
Treating the directory as something to create in initdb means you'll
need to bump catversion when you apply it.  I'm not sure where you are
planning to document, but there should at least be a mention in the
database physical layout chapter, since that's supposed to enumerate
all the subdirectories of $PGDATA.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 Michael Fuhr solved it so this is academic but, the buildfarm runs make
 installcheck? I thought it just ran make check

 It runs both.

It also runs contrib installcheck, which will most definitely exercise
DROP DATABASE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Michael Fuhr [EMAIL PROTECTED] writes:
 Are any prepared transactions still open?

 Uh, yes, I did notice that but didn't put two and two together. That does make
 sense now that you mention it.

I've been bit by that too, and so have other people.  Maybe it'd be
worth the trouble to improve the message so that it explicitly tells you
when there are prepared transactions blocking the DROP.

Another possibility is to let the DROP automatically roll back the
conflicting prepared xacts, but that seems a bit dangerous.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Heikki Linnakangas

Tom Lane wrote:

Gregory Stark [EMAIL PROTECTED] writes:

Michael Fuhr [EMAIL PROTECTED] writes:

Are any prepared transactions still open?



Uh, yes, I did notice that but didn't put two and two together. That does make
sense now that you mention it.


I've been bit by that too, and so have other people.  Maybe it'd be
worth the trouble to improve the message so that it explicitly tells you
when there are prepared transactions blocking the DROP.


Yes, that should be easy enough.


Another possibility is to let the DROP automatically roll back the
conflicting prepared xacts, but that seems a bit dangerous.


Yeah, the prepared xact might have modified shared catalogs, for example.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I've been bit by that too, and so have other people.  Maybe it'd be
 worth the trouble to improve the message so that it explicitly tells you
 when there are prepared transactions blocking the DROP.

 Yes, that should be easy enough.

I looked at this quickly and decided that we can do it with some small
changes to CheckOtherDBBackends().  I propose counting the number of
conflicting PGPROCs and adding a DETAIL line to the existing error
message:

ERROR: database %s is being accessed by other users
DETAIL: There are %d session(s) and %d prepared transaction(s) using the 
database.

I'm aware that this phrasing might not translate very nicely ... anyone
have a suggestion for better wording?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PL/PythonU

2008-08-04 Thread David Blewett
Hi All:

This is an off-shoot of the Do we really want to migrate plproxy and
citext into PG core distribution? thread.

On the way home from PyOhio, I had a conversation with a few people
that use Zope a lot. I happened to mention that Postgres doesn't have
an untrusted version of pl/python and they were curious as to why.
They directed me to Zope's Restricted Python implementation [1][2]. In
doing some research, I found the Pl/Python -- current maintainer?
[3] thread from 2006. I also found this [4] thread on the python-dev
mailing list.

Hannu: You had mentioned bringing pl/python up to the level of some of
the other pl's. Have you thought any more about pl/pythonu?

David Blewett
--
A few quotes from the python-dev thread (links at bottom):
Here is some context for Python-Dev.

RestrictedPython is a custom Python compiler that, when combined with a
restricted environment, provides a sandbox safe enough to allow
partly-trusted people to write and execute scripts on a Zope server.  It
has been used in Zope 2 for a long time and will have a future in Zope
3.  The sandbox is more extensive than what the rexec module provides.

The safety of RestrictedPython has been validated in a somewhat formal
process with Python 2.4.  Ranjith is working to validate it with Python
2.5. -- Shane Hathaway [5]

No, he means the restricted Python compiler and capability-proxy
system used by Zope.  You know, the one I always bring up whenever
anybody says they want to implement capabilities in Python?  ;-)

Zope's restricted Python is basically a combination of a special
compiler, __builtin__ replacements, and a proxy type.  Instead of
using LOAD_ATTR opcodes, the compiler generates code that calls a
special getattr() function instead, and most objects other than
relatively-safe builtin types are wrapped in proxies that control
what attributes can be accessed and what operations can be performed.

The restricted Python framework itself doesn't impose any particular
security policy; proxies delegate checks to checker objects that
are essentially capabilities.  Mostly, it focuses on creating a safe
sandbox that can be expanded.  -- Phillip J. Eby [6]

1. http://svn.zope.org/RestrictedPython/trunk/
2. http://svn.zope.org/zope.security/trunk/src/zope/security/untrustedpython/
3. http://archives.postgresql.org/pgsql-hackers/2006-02/msg01036.php
4. http://mail.python.org/pipermail/python-dev/2008-July/081340.html
5. http://mail.python.org/pipermail/python-dev/2008-July/081412.html
6. http://mail.python.org/pipermail/python-dev/2008-July/081347.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Strawberry Perl?

2008-08-04 Thread David E. Wheeler

Howdy,

I noticed this in the weekly news:


Magnus Hagander committed:

- In pgsql/doc/src/sgml/install-win32.sgml, document which versions of
 ActivePerl and ActiveTcl are required for building on MSVC, and that
 the free distribution is enough (no need for the enterprise
 version).  Per gripe from Martin Zaun.



I was just wondering whether or not it also worked with Strawberry  
Perl, the Perl community's own binary distribution of Perl on Windows?


  http://strawberryperl.com/

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] \ef function in psql

2008-08-04 Thread David E. Wheeler

On Jul 31, 2008, at 00:07, Abhijit Menon-Sen wrote:


I have attached two patches:

- funcdef.diff implements pg_get_functiondef()
- edit.diff implements \ef function in psql based on (1).

Comments appreciated.


+1

I like! The ability to easily edit a function on the fly in psql will  
be very welcome to DBAs I know. And I like the pg_get_functiondef()  
function, too, a that will simplify editing existing functions in  
other admin apps, like pgAdmin.


I'm starting to get really excited for 8.4. I can haz cheezburger?  
Oops, I mean, when does it ship? ;-P


Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-08-04 Thread David E. Wheeler

On Jul 31, 2008, at 10:42, David E. Wheeler wrote:

Good point --- so new members of STRING category aren't going to be  
that

common, except for domains which apparently aren't bothering people
anyway.  I'll go ahead and make the change.  (I think it's just a
trivial change in find_coercion_pathway, and everything else should
Just Work.  If it turns out not to be trivial maybe we should
reconsider.)


Wow. Really nice, Tom. Thanks!

The attached patch has all the tests I added to my svn version  
against 8.3, and for which I had to write 60 additional cast  
functions.


Ping! Just wanted to make sure this wasn't lost in the shuffle…

Thanks,

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-08-04 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Jul 31, 2008, at 10:42, David E. Wheeler wrote:

 Wow. Really nice, Tom. Thanks!

 The attached patch has all the tests I added to my svn version against 
 8.3, and for which I had to write 60 additional cast functions.

 Ping! Just wanted to make sure this wasn't lost in the shuffle…

Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PythonU

2008-08-04 Thread Hannu Krosing
On Mon, 2008-08-04 at 13:08 -0400, David Blewett wrote:
 Hi All:
 
 This is an off-shoot of the Do we really want to migrate plproxy and
 citext into PG core distribution? thread.
 
 On the way home from PyOhio, I had a conversation with a few people
 that use Zope a lot. I happened to mention that Postgres doesn't have
 an untrusted version of pl/python and they were curious as to why.
 They directed me to Zope's Restricted Python implementation [1][2]. In
 doing some research, I found the Pl/Python -- current maintainer?
 [3] thread from 2006. I also found this [4] thread on the python-dev
 mailing list.
 
 Hannu: You had mentioned bringing pl/python up to the level of some of
 the other pl's. Have you thought any more about pl/pythonu?

My recollection of old times (about python v. 1.6) was that the
restricted sandboxes had some fatal flaws. I have not followed zope's
RestrictedPython enough to have an opinion on its safety.

 David Blewett
 --
 A few quotes from the python-dev thread (links at bottom):
 Here is some context for Python-Dev.
 
 RestrictedPython is a custom Python compiler that, when combined with a
 restricted environment, provides a sandbox safe enough to allow
 partly-trusted people to write and execute scripts on a Zope server.  It
 has been used in Zope 2 for a long time and will have a future in Zope
 3.  The sandbox is more extensive than what the rexec module provides.
 
 The safety of RestrictedPython has been validated in a somewhat formal
 process with Python 2.4.  Ranjith is working to validate it with Python
 2.5. -- Shane Hathaway [5]

Will definitely look at it. Thanks.

 No, he means the restricted Python compiler and capability-proxy
 system used by Zope.  You know, the one I always bring up whenever
 anybody says they want to implement capabilities in Python?  ;-)
 
 Zope's restricted Python is basically a combination of a special
 compiler, __builtin__ replacements, and a proxy type.  Instead of
 using LOAD_ATTR opcodes, the compiler generates code that calls a
 special getattr() function instead, and most objects other than
 relatively-safe builtin types are wrapped in proxies that control
 what attributes can be accessed and what operations can be performed.
 
 The restricted Python framework itself doesn't impose any particular
 security policy; proxies delegate checks to checker objects that
 are essentially capabilities.  Mostly, it focuses on creating a safe
 sandbox that can be expanded.  -- Phillip J. Eby [6]
 
 1. http://svn.zope.org/RestrictedPython/trunk/
 2. http://svn.zope.org/zope.security/trunk/src/zope/security/untrustedpython/
 3. http://archives.postgresql.org/pgsql-hackers/2006-02/msg01036.php
 4. http://mail.python.org/pipermail/python-dev/2008-July/081340.html
 5. http://mail.python.org/pipermail/python-dev/2008-July/081412.html
 6. http://mail.python.org/pipermail/python-dev/2008-July/081347.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-08-04 Thread David E. Wheeler

On Aug 4, 2008, at 11:02, Alvaro Herrera wrote:


Ping! Just wanted to make sure this wasn't lost in the shuffle…


Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09


Sure, although it's a simple refinement (read: tests, mainly) of an  
accepted July patch, submitted before the end of July. I can put it in  
for Sept, but thought it might be best to deal with it while the  
original patch is still fresh in (Tom's) mind and since it's something  
Tom asked me for. :-)


Anyway, added to the 09 commitfest page.

Best,

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-08-04 Thread Andrew Dunstan



David E. Wheeler wrote:

On Aug 4, 2008, at 11:02, Alvaro Herrera wrote:


Ping! Just wanted to make sure this wasn't lost in the shuffle…


Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09


Sure, although it's a simple refinement (read: tests, mainly) of an 
accepted July patch, submitted before the end of July. I can put it in 
for Sept, but thought it might be best to deal with it while the 
original patch is still fresh in (Tom's) mind and since it's something 
Tom asked me for. :-)


Anyway, added to the 09 commitfest page.




There is nothing to prevent such things from being committed ahead of 
that commitfest.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 03:50:40 daveg wrote:
 On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
  ISTR that what ended up killing the enthusiasm for this was that most
  people realized that this GUC was just a poor tool to take a stab at
  solving other problems (ie. rate limiting cpu for queries).

 I'm not concerned with that, I want developers to have feed back on costs
 in a way that is obvious.


That was one of the other use cases that was pushed forward in the past. 

I think a variation on this could be very useful in development and
test environments. Suppose it raised a warning or notice if the cost
was over the limit. Then one could set a limit of a few million on
the development and test servers and developers would at least have a
clue that they needed to look at explain for that query. As it is
now, one can exhort them to run explain, but it has no effect. 
Instead we later see queries killed by a 24 hour timeout with
estimated costs ranging from until they unplug the machine and dump
it to until the sun turns into a red giant.
  
   Great argument. So that's 4 in favour at least.
 
  Not such a great argument. Cost models on development servers can and
  often are quite different from those on production, so you might be
  putting an artifical limit on top of your developers.

 We load the production dumps into our dev environment, which are the same
 hardware spec, so the costs should be identical.


That's great for you, I am talking in the scope of a general solution. (Note 
I'd also bet that even given the same hardware, different production loads 
can produce different relative mappings of cost vs. performance, but 
whatever)

  I still think it is worth revisiting what problems people are trying to
  solve, and see if there are better tools they can be given to solve them.
   Barring that, I suppose a crude solution is better than nothing, though
  I fear people might point at the crude solution as a good enough solution
  to justify not working on better solutions.

 Alerting developers and QA to potentially costly queries would help solve
 some of the probems we are trying to solve. Better tools are welcome, an
 argument that the good is the enemy of the best so we should be content
 with nothing is not.


And you'll note, I specifically said that a crude tool is better than nothing. 
But your completely ignoring that a crude tool can often end-up as a foot-gun 
once relased into the wild. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Josh Berkus

Greg,


Well that's going to depend on the application But I suppose there's
nothing wrong with having options which aren't always a good idea to use. The
real question I guess is whether there's ever a situation where it would be a
good idea to use this. I'm not 100% sure.


I can think of *lots*.   Primarily, simple web applications, where 
queries are never supposed to take more than 50ms.  If a query turns up 
with an estimated cost of 100, then you know something's wrong; 
in the statistics if not in the query.  In either case, that query has a 
good chance of dragging down the whole system.


In such a production application, it is better to have false positives 
and reject otherwise-OK queries becuase their costing is wrong, than to 
let a single cartesian join bog down an application serving 5000 
simultaneous users.  Further, with a SQL error, this would allow the 
query rejection to be handled in a user-friendly way from the UI 
(Search too complex.  Try changing search terms.) rather than timing 
out, which is very difficult to handle well.


The usefulness of this feature for interactive sessions is 
limited-to-nonexistant.  It's for production applications.


--Josh Berkus


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python

2008-08-04 Thread David Blewett
On Mon, Aug 4, 2008 at 1:56 PM, Hannu Krosing [EMAIL PROTECTED] wrote:
 Hannu: You had mentioned bringing pl/python up to the level of some of
 the other pl's. Have you thought any more about pl/pythonu?

Obviously, I meant pl/python. Subject line fixed to. Sorry for the noise.

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Greg Smith

On Mon, 4 Aug 2008, daveg wrote:


On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:

Not such a great argument. Cost models on development servers can and often
are quite different from those on production, so you might be putting an
artifical limit on top of your developers.


We load the production dumps into our dev environment, which are the same
hardware spec, so the costs should be identical.


Not identical, just close.  ANALYZE samples data from your table randomly. 
The statistics used to compute the costs will therefore be slightly 
different on the two servers even if the data is the same.  The problem of 
discovering one plan on production and another on development is not quite 
that easy to remove.  Ultimately, if your developers aren't thorough 
enough to do thinks like look at EXPLAIN plans enough to discover things 
that are just bad, I just chuckle at your thinking that putting a single 
limiter on their bad behavior will somehow magically make that better.


Anyway, if your production server is small enough that you can afford to 
have another one just like it for the developers to work on, that's great. 
Robert's point is that many installs don't work like that.  The 
development teams in lots of places only get a subset of the production 
data because it's too large to deploy on anything but a big server, which 
often is hard to cost justify buying just for development purposes.


I like the concept of a cost limit, but I'm a bit horrified by the thought 
of it being exposed simply through the internal cost numbers because they 
are so arbitrary.  One of the endless projects I think about but never 
start coding is to write something that measures the things the planner 
cost constants estimate on a particular machine, so that all those numbers 
actually can be tied to some real-world time measure.  If you did that, 
you'd actually have a shot at accomplishing the real goal here, making 
statement_cost_limit cut off statements expected to take longer than 
statement_timeout before they even get started.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python

2008-08-04 Thread Hannu Krosing
On Mon, 2008-08-04 at 15:02 -0400, David Blewett wrote:
 On Mon, Aug 4, 2008 at 1:56 PM, Hannu Krosing [EMAIL PROTECTED] wrote:
  Hannu: You had mentioned bringing pl/python up to the level of some of
  the other pl's. Have you thought any more about pl/pythonu?
 
 Obviously, I meant pl/python. Subject line fixed to. Sorry for the noise.

sure, it's just the other side of the coin ;)

 David Blewett


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CommitFest July Over

2008-08-04 Thread Josh Berkus

Hackers,

Well, after a month the July CommitFest is officially closed.  At this 
point, we're operating with the defacto rule that commitfests shouldn't 
last more than a month.


Because some patches are still being discussed, they've been moved over 
automatically to the September commitfest.  A much large number of 
patches are now in returned with feedback; if your patch is in there, 
probably hackers is waiting for some kind of response from you.


Lots of stuff was committed, too.  8.4 is looking very exciting.

Post-mortem things we've learned about the commitfest are:

1) It's hard to get anything done in June-July.

2) The number of patches is going to keep increasing with each 
commitfest.  As such, the patch list is going to get harder to deal 
with.  We now urgently need to start working on CF management software.


3) Round Robin Reviewers didn't really work this time, aside from 
champion new reviewer Abhjit.  For the most part, RRR who were assigned 
patches did not review them for 2 weeks.  Two areas where this concept 
needs to be improved:
	a) we need to assign RRR to patches two days after the start of 
commitfest, not a week later;
	b) there needs to be the expectation that RRR will start reviewing or 
reject the assignment immediately.


4) We need to work better to train up new reviewers.  Some major 
committer(s) should have worked with Abhjit, Thomas and Martin 
particularly on getting them to effectively review patches; instead, 
committers just handled stuff *for* them for the most part, which isn't 
growing our pool of reviewers.


5) Patch submitters need to understand that patch submission isn't 
fire-and-forget.  They need to check back, and respond to queries from 
reviewers.  Of course, a patch-tracker which automatically notified the 
submitter would help.


6) Overall, I took a low-nag-factor approach to the first time as 
commitfest manager.  This does not seem to have been the best way; I'd 
suggest for september that the manager make more frequent nags.


Finally: who wants to be CF Manager for September?  I'm willing to do it 
again, but maybe someone else should get a turn.


--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 03:09:34PM -0400, Greg Smith wrote:
 On Mon, 4 Aug 2008, daveg wrote:
 We load the production dumps into our dev environment, which are the same
 hardware spec, so the costs should be identical.
 
 Not identical, just close.  ANALYZE samples data from your table randomly. 
 The statistics used to compute the costs will therefore be slightly 
 different on the two servers even if the data is the same.  The problem of 
 discovering one plan on production and another on development is not quite 
 that easy to remove.  Ultimately, if your developers aren't thorough 
 enough to do thinks like look at EXPLAIN plans enough to discover things 
 that are just bad, I just chuckle at your thinking that putting a single 
 limiter on their bad behavior will somehow magically make that better.

Not all developers can be persuaded to run explain on every change.
However, many will investigate a new message. I'm only hoping to try to
focus their attention toward possible problem queries.

 Anyway, if your production server is small enough that you can afford to 
 have another one just like it for the developers to work on, that's great. 
 Robert's point is that many installs don't work like that.  The 
 development teams in lots of places only get a subset of the production 
 data because it's too large to deploy on anything but a big server, which 
 often is hard to cost justify buying just for development purposes.

Not to get into a size war ;-), but the production environment I'd like this
feature for is over 40 32GB 16 scsi drive quadcore boxes. These are dedicated
to postgresql and run one or just a few databases. There are also a bunch
of client boxes that we will not speak of. The staging and test environments
are similar hardware but have only a subset of the databases copied to them.
There are probably than a dozen DB hosts for that.

 I like the concept of a cost limit, but I'm a bit horrified by the thought 
 of it being exposed simply through the internal cost numbers because they 
 are so arbitrary.  One of the endless projects I think about but never 


 start coding is to write something that measures the things the planner 
 cost constants estimate on a particular machine, so that all those numbers 
 actually can be tied to some real-world time measure.  If you did that, 
 you'd actually have a shot at accomplishing the real goal here, making 
 statement_cost_limit cut off statements expected to take longer than 
 statement_timeout before they even get started.

That is a nice idea. Possibly it could be a utility like the fsync tester.

But planner estimates are never going to be all that accurate even with solid
cost numbers because for some classes of queries, particularly those with
many joins the stats can be good at each level but the error accumulates
exponentially. Which is why I think a warning is appropriate instead of an
error. Even a notice in the logs would be useful.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:
 
 That's great for you, I am talking in the scope of a general solution. (Note 
 I'd also bet that even given the same hardware, different production loads 
 can produce different relative mappings of cost vs. performance, but 
 whatever)

Even on different hardware it would still likely warn of mistakes like
products due to missing join conditions etc.
 
   I still think it is worth revisiting what problems people are trying to
   solve, and see if there are better tools they can be given to solve them.
Barring that, I suppose a crude solution is better than nothing, though
   I fear people might point at the crude solution as a good enough solution
   to justify not working on better solutions.
 
  Alerting developers and QA to potentially costly queries would help solve
  some of the probems we are trying to solve. Better tools are welcome, an
  argument that the good is the enemy of the best so we should be content
  with nothing is not.
 
 And you'll note, I specifically said that a crude tool is better than 
 nothing. 

I released somewhat after I sent the above that it might have sounded a bit
snippy. I hope I have not offended.

 But your completely ignoring that a crude tool can often end-up as a foot-gun 
 once relased into the wild. 

I'm suggesting a warning, or even just a notice into the logs, I don't see
the footgun. What am I missing?

Regards

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Hannu Krosing
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:
  On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote:
...
   I still think it is worth revisiting what problems people are trying to
   solve, and see if there are better tools they can be given to solve them.
Barring that, I suppose a crude solution is better than nothing, though
   I fear people might point at the crude solution as a good enough solution
   to justify not working on better solutions.
 
  Alerting developers and QA to potentially costly queries would help solve
  some of the probems we are trying to solve. Better tools are welcome, an
  argument that the good is the enemy of the best so we should be content
  with nothing is not.
 
 
 And you'll note, I specifically said that a crude tool is better than 
 nothing. 
 But your completely ignoring that a crude tool can often end-up as a foot-gun 
 once relased into the wild. 

On the other other hand, _anything_ can end up as a foot-gun in hands of
ingenious users.

I was once told about a company, who claimed to have produced a
positively fool-proof lawn-mower, only to find out, that a university
professor had tried to use it to trim a hedge and cut off his toes.

-
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes:

 In such a production application, it is better to have false positives and
 reject otherwise-OK queries becuase their costing is wrong, than to let a
 single cartesian join bog down an application serving 5000 simultaneous users.
 Further, with a SQL error, this would allow the query rejection to be handled
 in a user-friendly way from the UI (Search too complex.  Try changing search
 terms.) rather than timing out, which is very difficult to handle well.

 The usefulness of this feature for interactive sessions is
 limited-to-nonexistant.  It's for production applications.

Wow. I couldn't disagree more.

For such an application this would be a major foot-gun which would give a
false sense of security simultaneously causing random outages and not
providing even the protection you're counting on.

It would be quite likely to miss some cartesian joins and allow problematic
queries through randomly and block other perfectly legitimate queries. I's no
substitute for writing your search engine query generator to actually check
that it has enough constraints to avoid any disallowed cartesion joins.

That people might think it's reliable enough to use for such applications is
my major concern and if my guess is right, Tom's as well. I suspect you may
have just sunk any chance of getting him on-side.

Where I see it useful is a) during development when it might help catch
erroneous queries as a kind of sql-lint. and b) when running ad-hoc DBA
queries where it might let the DBA catch the error before letting it run for a
while. I'm sure I'm not the only DBA who let a query run for 5 minutes before
wondering if it should really be taking that long.

I would be much more comfortable if it produced a warning, not an error. And
much more if we implemented my previous thought of having some settings which
generate warnings if they're set at startup saying that's not recommended.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 11:59:03AM -0700, Josh Berkus wrote:
 Greg,
 
 Well that's going to depend on the application But I suppose there's
 nothing wrong with having options which aren't always a good idea to use. 
 The
 real question I guess is whether there's ever a situation where it would 
 be a
 good idea to use this. I'm not 100% sure.
 
 I can think of *lots*.   Primarily, simple web applications, where 
 queries are never supposed to take more than 50ms.  If a query turns up 
 with an estimated cost of 100, then you know something's wrong; 
 in the statistics if not in the query.  In either case, that query has a 
 good chance of dragging down the whole system.
 
 In such a production application, it is better to have false positives 
 and reject otherwise-OK queries becuase their costing is wrong, than to 
 let a single cartesian join bog down an application serving 5000 
 simultaneous users.  Further, with a SQL error, this would allow the 
 query rejection to be handled in a user-friendly way from the UI 
 (Search too complex.  Try changing search terms.) rather than timing 
 out, which is very difficult to handle well.
 
 The usefulness of this feature for interactive sessions is 
 limited-to-nonexistant.  It's for production applications.

Ok, that is a different use case where an error seems very useful. What
about slightly extending the proposal to have the severity of exceeding
the limit configurable too. Something like:

   costestimate_limit = 10 # default 0 to ignore limit
   costestimate_limit_severity = error # debug, notice, warning, error

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Kevin Grittner
 Hannu Krosing [EMAIL PROTECTED] wrote:
 
 I was once told about a company, who claimed to have produced a
 positively fool-proof lawn-mower, only to find out, that a
university
 professor had tried to use it to trim a hedge and cut off his toes.
 
Odd.  Seriously, about 45 years ago I lived next door to a university
botany professor who cut off his fingertips that way.  I wonder if
professors are more prone to this or whether the story got mangled
over time.
 
Perhaps software should have special protection for professors
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Simon Riggs

On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
 On Monday 04 August 2008 03:50:40 daveg wrote:

 And you'll note, I specifically said that a crude tool is better than
 nothing. But your completely ignoring that a crude tool can often
 end-up as a foot-gun once relased into the wild. 

The proposal is for an option with no consequences when turned off. We
respect your right not to use it. What is the danger exactly? 

If we cancel stupid queries before people run them, everybody is a
winner. Even the person who submitted the stupid query, since they find
out faster.

Sure, its an estimate, but it's got to be a based upon an estimate if it
acts *before* it runs. And surely there is no better estimate of the
cost than the plan cost?

It doesn't stop anyone from putting in resource limits, later.

We'll have to do something with enable_seqscan, BTW, chaps.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Josh Berkus
Greg,

 For such an application this would be a major foot-gun which would give
 a false sense of security simultaneously causing random outages and not
 providing even the protection you're counting on.

Hmmm.  That sounds like a call for some testing.  While our cost estimation 
has some issues, I don't think it's unreliable as all that.  And it's easy 
enough to document the limitations.  If it's 80% accurate, then it's 
fixing more problems than it causes.  If it's 30% accurate, then obviously 
it's a bad idea.

Personally, I don't see much use for this interactively, because an 
experienced DBA can easily enough run an EXPLAIN before running the query.  
I usually do, on production systems.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Automatic Client Failover

2008-08-04 Thread Simon Riggs
When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

We might want to specify that centrally and then send the redirection
address to the client when it connects. Sounds like lots of work though.

Seems fairly straightforward to specify a standby connection service at
client level: .pgreconnect, or pgreconnect.conf
No config, then option not used.

Would work with various forms of replication.

Implementation would be to make PQreset() try secondary connection if
the primary one fails to reset. Of course you can program this manually,
but the feature is that you wouldn't need to, nor would you need to
request changes to 27 different interfaces either.

Good? Bad? Ugly? 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 16:49:43 Simon Riggs wrote:
 On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote:
  On Monday 04 August 2008 03:50:40 daveg wrote:
 
  And you'll note, I specifically said that a crude tool is better than
  nothing. But your completely ignoring that a crude tool can often
  end-up as a foot-gun once relased into the wild.

 The proposal is for an option with no consequences when turned off. We
 respect your right not to use it. What is the danger exactly?


All of the proposals I have seen for adding query hints would also have no 
consequence if not used, but no one seems to care about that argument. 

:-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Jonah H. Harris
On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs [EMAIL PROTECTED] wrote:
 When primary server fails, it would be good if the clients connected to
 the primary knew to reconnect to the standby servers automatically.

This would be a nice feature which many people I've talked to have
asked for.  In Oracle-land, it's called Transparent Application
Failover (TAF) and it gives you a lot of options, including the
ability to write your own callbacks when a failover is detected.

+1

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 15:56:25 daveg wrote:
 On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote:
  On Monday 04 August 2008 03:50:40 daveg wrote:
 
  That's great for you, I am talking in the scope of a general solution.
  (Note I'd also bet that even given the same hardware, different
  production loads can produce different relative mappings of cost vs.
  performance, but whatever)

 Even on different hardware it would still likely warn of mistakes like
 products due to missing join conditions etc.


See, this is what we ended up talking about before. Someone will say I'd like 
to prevent my devs from accidentally doing queries with cartesian products 
and they will use this to do it... but that will only work in some cases, so 
it becomes a poor tool to solve a different problem. 

BTW, what I really love about statement costs, is that they aren't even 
reliable on the same machine with the same data. I have seen query plans 
which run on the same data on the same machine where the resultant query 
runtime can vary from 2 hours to 5 hours, depending on how much other 
concurrent traffic is on the machine. Awesome eh? 

I still think it is worth revisiting what problems people are trying
to solve, and see if there are better tools they can be given to
solve them. Barring that, I suppose a crude solution is better than
nothing, though I fear people might point at the crude solution as a
good enough solution to justify not working on better solutions.
  
   Alerting developers and QA to potentially costly queries would help
   solve some of the probems we are trying to solve. Better tools are
   welcome, an argument that the good is the enemy of the best so we
   should be content with nothing is not.
 
  And you'll note, I specifically said that a crude tool is better than
  nothing.

 I released somewhat after I sent the above that it might have sounded a bit
 snippy. I hope I have not offended.

  But your completely ignoring that a crude tool can often end-up as a
  foot-gun once relased into the wild.

 I'm suggesting a warning, or even just a notice into the logs, I don't see
 the footgun. What am I missing?


The footgun in my mind is that people will think this solves a number of 
problems even though it doesnt solve them well.  However, the footgun for you 
might be that the current proposal will actually abort the query, not emit a 
warning (not sure if that changes your opinion of it).  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Josh Berkus
On Monday 04 August 2008 14:08, Simon Riggs wrote:
 When primary server fails, it would be good if the clients connected to
 the primary knew to reconnect to the standby servers automatically.

 We might want to specify that centrally and then send the redirection
 address to the client when it connects. Sounds like lots of work though.

 Seems fairly straightforward to specify a standby connection service at
 client level: .pgreconnect, or pgreconnect.conf
 No config, then option not used.

Well, it's less simple, but you can already do this with pgPool on the 
client machine.


-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Jonah H. Harris
On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus [EMAIL PROTECTED] wrote:
 Well, it's less simple, but you can already do this with pgPool on the
 client machine.

Yeah, but if you have tens or hundreds of clients, you wouldn't want
to be installing/managing a pgpool on each.  Similarly, I think an
application should have the option of being notified of a connection
change; I know that wasn't in Simon's proposal, but I've found it
necessary in several applications which rely on things such as
temporary tables.  You don't want the app just blowing up because a
table doesn't exist; you want to be able to handle it gracefully.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-04 Thread Kevin Grittner
 On Mon, Oct 22, 2007 at  1:30 PM, Simon Riggs wrote: 
 On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote:
 I've requested this before without response, but I'm asking again
 because it just caused me pain again: could we get a TODO added to
 have the planner recognize equivalent IN and EXISTS constructs and
 have them compete on cost estimates?  I know it's not a trivial
 improvement, but if it's on the list maybe someone will pick it up,
 and I see it as the single biggest weakness in PostgreSQL
 performance.
 
 I'll pick it up as a default unless someone requests they have it
from
 me.
 
Since Simon's focus has shifted to other issues, I'm hoping this can
go onto the TODO list.
 
I'm adding some NOT EXISTS examples to the thread for completeness of
what someone might want to address while working on it.  For two
queries which can easily be shown (to a human viewer, anyway) to
return identical results, I see performance differences of over five
orders of magnitude.  (Six if you compare to the LEFT JOIN ... WHERE
not_null_right_column IS NULL trick.)
 
Below are the cost estimates of the three techniques for a
medium-sized table joining to a large table, and for a large table
joining to a small table.  The IN behavior has the worst worst-case
behavior, at least in queries that I've run, although many people
report that it is usually faster.  The technique of doing an existence
test with a LEFT JOIN and then checking whether a NOT NULL column from
the right-hand table is null is often faster than either technique,
and seldom much worse than the best technique for any given test.
 
Queries and plans attached.  Summary of costs below, in millions of
cost units. (Fractions of a million discarded.)
 
NOT IN (independent_subquery)
19745843, 5
 
WHERE NOT EXISTS
74, 318
 
LEFT JOIN WHERE not_null_right_column IS NULL
10, 17
 
These cost estimates tend to come out in pretty consistent ratio to
the actual run times.
 
-Kevin
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND (A.countyNo, A.caseNo, A.jdgmtHistSeqNo) NOT IN 
(SELECT B.countyNo, B.caseNo, B.histSeqNo FROM CaseHist B)
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;
QUERY PLAN
--
 Index Scan using Jdgmt_pkey on Jdgmt A  
(cost=4003123.55..19745843977029.96 rows=6896890 width=24)
   Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan)))
   SubPlan
 -  Materialize  (cost=4003123.55..6253176.10 rows=167689505 width=20)
   -  Seq Scan on CaseHist B  (cost=0.00..3262276.55 
rows=167689505 width=20)
(5 rows)

cir=
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND NOT EXISTS
cir- (
cir(   SELECT * FROM CaseHist B
cir( WHERE B.countyNo = A.countyNo
cir(   AND B.caseNo = A.caseNo
cir(   AND B.histSeqNo = A.jdgmtHistSeqNo
cir( )
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;
  QUERY PLAN
--
 Index Scan using Jdgmt_pkey on Jdgmt A  (cost=0.00..74966880.92 
rows=6896890 width=24)
   Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan)))
   SubPlan
 -  Index Scan using CaseHist_pkey on CaseHist B  (cost=0.00..10.07 
rows=2 width=323)
   Index Cond: (((countyNo)::smallint = ($0)::smallint) AND 
((caseNo)::text = ($1)::text) AND ((histSeqNo)::smallint = ($2)::smallint))
(5 rows)

cir=
cir= explain
cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, 
A.jdgmtHistSeqNo
cir-   FROM Jdgmt A
cir-   LEFT JOIN CaseHist B
cir- ON ( B.countyNo = A.countyNo
cir(  AND B.caseNo = A.caseNo
cir(  AND B.histSeqNo = A.jdgmtHistSeqNo
cir()
cir-   WHERE A.jdgmtHistSeqNo IS NOT NULL
cir- AND B.countyNo IS NULL
cir-   ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo
cir- ;

QUERY PLAN  

---
 Sort  (cost=10616941.83..10645732.70 rows=11516349 width=24)
   Sort Key: A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo
   -  Merge Right Join  (cost=2020174.37..9175701.57 rows=11516349 width=24)
 Merge Cond: (((B.countyNo)::smallint = inner.?column6?) AND 
((B.caseNo)::text = inner.?column7?) AND 

Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes:
 On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus [EMAIL PROTECTED] wrote:
 Well, it's less simple, but you can already do this with pgPool on the
 client machine.

 Yeah, but if you have tens or hundreds of clients, you wouldn't want
 to be installing/managing a pgpool on each.

Huh?  The pgpool is on the server, not on the client side.

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail over
for reasons other than a primary server failure.  (Think network
partition.)  You really want any such behavior to be managed centrally,
IMHO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Hannu Krosing
On Mon, 2008-08-04 at 22:08 +0100, Simon Riggs wrote:
 When primary server fails, it would be good if the clients connected to
 the primary knew to reconnect to the standby servers automatically.
 
 We might want to specify that centrally and then send the redirection
 address to the client when it connects. Sounds like lots of work though.

One way to do it is _outside_ of client, by having a separately managed
subnet for logical DB addresses. So when a failover occurs, then you
move that logical DB address to the new host, flush ARP caches and just
reconnect.

This also solves the case of inadvertent failover in case of unrelated
network failure.

 Seems fairly straightforward to specify a standby connection service at
 client level: .pgreconnect, or pgreconnect.conf
 No config, then option not used.
 
 Would work with various forms of replication.
 
 Implementation would be to make PQreset() try secondary connection if
 the primary one fails to reset. Of course you can program this manually,
 but the feature is that you wouldn't need to, nor would you need to
 request changes to 27 different interfaces either.
 
 Good? Bad? Ugly? 
 
 -- 
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Dimitri Fontaine

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Le 5 août 08 à 01:13, Tom Lane a écrit :

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail  
over

for reasons other than a primary server failure.  (Think network
partition.)  You really want any such behavior to be managed  
centrally,

IMHO.



Then, what about having pgbouncer capability into -core. This would  
probably mean, AFAIUI,  than the listen()ing process would no longer  
be postmaster but a specialized one, with the portable poll()/ 
select()/... process, that is now know as pgbouncer.


Existing pgbouncer would have to be expanded to:
 - provide a backward compatible mode
(session pooling, release server session at client closing time)
 - allow to configure several backend servers and to try next on  
certain conditions

 - add hooks for clients to know when some events happen
(failure of current master, automatic switchover, etc)

Existing pgbouncer hooks and next ones could be managed with catalog  
tables as we have special options table for autovacuum, e.g.,  
pg_connection_pool, which could contain arbitrary SQL for new backend  
fork, backend closing, failover, switchover, etc; and maybe the client  
hooks would be NOTIFY messages sent from the backend at its initiative.


Would we then have the centrally managed behavior Tom is mentioning?
I'm understanding this in 2 ways:
 - this extension would be able to distinguish between failure cases  
where we are able
   to do an automatic failover from hard crashes (impacting the  
listener)
 - when we have read-only slave(s) pgbouncer will be able to redirect  
ro statements to it.


Maybe it would even be useful to see about Markus' work in Postgres-R  
and its inter-backend communication system allowing the executor to  
require more than one backend working on a single query. The pgbouncer  
inherited system would then be a pre-forked backend pooling manager  
too...


Once more, I hope that giving (not so) random ideas here as a (not  
yet) pgsql hacker is helping the project more than it's disturbing  
real work...


Regards,
- --
dim

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkiXk5gACgkQlBXRlnbh1bkBhACfQdgHh27yGeyHgeCrC7aV1LET
U4IAn1N6FaanI2BEWMLyPWKmGtedaSQC
=ifVF
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IN vs EXISTS equivalence

2008-08-04 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 I'm adding some NOT EXISTS examples to the thread for completeness of
 what someone might want to address while working on it.  For two
 queries which can easily be shown (to a human viewer, anyway) to
 return identical results, I see performance differences of over five
 orders of magnitude.

Could we see EXPLAIN ANALYZE not just EXPLAIN for these?  When people
are complaining of bad planner behavior, I don't find bare EXPLAIN
output to be very convincing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Tom Lane
Dimitri Fontaine [EMAIL PROTECTED] writes:
 Le 5 août 08 à 01:13, Tom Lane a écrit :
 There is one really bad consequence of the oversimplified failover
 design that Simon proposes, which is that clients might try to fail
 over for reasons other than a primary server failure.  (Think network
 partition.)  You really want any such behavior to be managed
 centrally, IMHO.

 Then, what about having pgbouncer capability into -core. This would  
 probably mean, AFAIUI,  than the listen()ing process would no longer  
 be postmaster but a specialized one,

Huh?  The problem case is that the primary server goes down, which would
certainly mean that a pgbouncer instance on the same machine goes with
it.  So it seems to me that integrating pgbouncer is 100% backwards.

Failover that actually works is not something we can provide with
trivial changes to Postgres.  It's really a major project in its
own right: you need heartbeat detection, STONITH capability,
IP address redirection, etc.  I think we should be recommending
external failover-management project(s) instead of offering a
half-baked home-grown solution.  Searching freshmeat for failover
finds plenty of potential candidates, but not having used any of
them I'm not sure which are worth closer investigation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Josh Berkus
Tom,

 Failover that actually works is not something we can provide with
 trivial changes to Postgres. 

I think the proposal was for an extremely simple works 75% of the time 
failover solution.  While I can see the attraction of that, the 
consequences of having failover *not* work are pretty severe.

On the other hand, we will need to deal with this for the built-in 
replication project.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Jens-Wolfhard Schicke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 ERROR: database %s is being accessed by other users
 DETAIL: There are %d session(s) and %d prepared transaction(s) using the 
 database.
 
 I'm aware that this phrasing might not translate very nicely ... anyone
 have a suggestion for better wording?
I can only estimate translation effort into German, but how about:

DETAIL: Active users of the database: %d session(s), %d prepared transaction(s)

Jens
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIl6G4zhchXT4RR5ARAh7BAJ4vGKx0f/1aycXOfJZmkOAg1fe2IgCgpXVe
HF9CSX3bSZI/eO4GB3xSrdc=
=Ogzl
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 05:17:59PM -0400, Jonah H. Harris wrote:
 On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs [EMAIL PROTECTED] wrote:
  When primary server fails, it would be good if the clients connected to
  the primary knew to reconnect to the standby servers automatically.
 
 This would be a nice feature which many people I've talked to have
 asked for.  In Oracle-land, it's called Transparent Application
 Failover (TAF) and it gives you a lot of options, including the
 ability to write your own callbacks when a failover is detected.

This might be better done as part of a proxy server, eg pgbouncer, pgpool
than as part of postgresql or libpq. I like the concept, but the logic to
determine when a failover has occurred is complex and a client will often
not have access to enough information to make this determination accurately.

postgresql could have hooks to support this though, ie to determine when a
standby thinks it has become the master.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mini improvement: statement_cost_limit

2008-08-04 Thread daveg
On Mon, Aug 04, 2008 at 05:19:50PM -0400, Robert Treat wrote:
 See, this is what we ended up talking about before. Someone will say I'd 
 like 
 to prevent my devs from accidentally doing queries with cartesian products 
 and they will use this to do it... but that will only work in some cases, so 
 it becomes a poor tool to solve a different problem. 
 
 BTW, what I really love about statement costs, is that they aren't even 
 reliable on the same machine with the same data. I have seen query plans 
 which run on the same data on the same machine where the resultant query 
 runtime can vary from 2 hours to 5 hours, depending on how much other 
 concurrent traffic is on the machine. Awesome eh? 

Sure, I don't think anyone believes that costs are precise. But the case that
is interesting is 2 hours versus years and years.

 The footgun in my mind is that people will think this solves a number of 
 problems even though it doesnt solve them well.  However, the footgun for yo

I suspect that a good solution to this problem is impossible as it is more
or less the halting problem. So I'm willing to accept a poor solution based
on costs and then hope we improve the cost model.

-dg

-- 
David Gould   [EMAIL PROTECTED]  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest July Over

2008-08-04 Thread Robert Treat
On Monday 04 August 2008 15:38:35 Josh Berkus wrote:
 Hackers,

 Well, after a month the July CommitFest is officially closed.  At this
 point, we're operating with the defacto rule that commitfests shouldn't
 last more than a month.

 Because some patches are still being discussed, they've been moved over
 automatically to the September commitfest.  A much large number of
 patches are now in returned with feedback; if your patch is in there,
 probably hackers is waiting for some kind of response from you.


People should understand they don't have to wait for a commitfest to continue 
development, right? (Ie. if your patch got rejected, start getting it in 
shape now, and ask questions now)

 Lots of stuff was committed, too.  8.4 is looking very exciting.


+1

 Post-mortem things we've learned about the commitfest are:

 1) It's hard to get anything done in June-July.


True... vacations and conferences abound. September should be better in this 
regard I would think. 

 2) The number of patches is going to keep increasing with each
 commitfest.  As such, the patch list is going to get harder to deal
 with.  We now urgently need to start working on CF management software.

 3) Round Robin Reviewers didn't really work this time, aside from
 champion new reviewer Abhjit.  For the most part, RRR who were assigned
 patches did not review them for 2 weeks.  Two areas where this concept
 needs to be improved:
   a) we need to assign RRR to patches two days after the start of
 commitfest, not a week later;

This seems tricky, since you want people to volunteer to review patches 
ideally, will two days be enough? Should people interested in reviewing be 
signing up ahead of time? Looking at the next commitfest, it is going to 
start on a Monday... maybe auto-assigning reviewers on Wednesday is OK. 

   b) there needs to be the expectation that RRR will start reviewing or
 reject the assignment immediately.


I wonder if too much time was spent on patches like the WITH patch, which 
seemed pretty early on it was not ready for commit... thoughts? 

 4) We need to work better to train up new reviewers.  Some major
 committer(s) should have worked with Abhjit, Thomas and Martin
 particularly on getting them to effectively review patches; instead,
 committers just handled stuff *for* them for the most part, which isn't
 growing our pool of reviewers.

 5) Patch submitters need to understand that patch submission isn't
 fire-and-forget.  They need to check back, and respond to queries from
 reviewers.  Of course, a patch-tracker which automatically notified the
 submitter would help.


Reviewers should be responding to the email on -hackers that is pointed to by 
the wiki, so patch submitters should be getting notified... right ?

 6) Overall, I took a low-nag-factor approach to the first time as
 commitfest manager.  This does not seem to have been the best way; I'd
 suggest for september that the manager make more frequent nags.


Is there something you want people to nag people about? 

 Finally: who wants to be CF Manager for September?  I'm willing to do it
 again, but maybe someone else should get a turn.


Why stop now when you've got the momentum? :-) 

Seriously though, I thought we were supposed to have 2 people working as CF 
Managers for each CF... is that not the case? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic Client Failover

2008-08-04 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 I think the proposal was for an extremely simple works 75% of the time 
 failover solution.  While I can see the attraction of that, the 
 consequences of having failover *not* work are pretty severe.

Exactly.  The point of failover (or any other HA feature) is to get
several nines worth of reliability.  It usually works is simply
not playing in the right league.

 On the other hand, we will need to deal with this for the built-in 
 replication project.

Nope, that's orthogonal.  A failover solution depends on having a master
and a slave database, but it has nothing directly to do with how those
DBs are synchronized.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP DATABASE always seeing database in use

2008-08-04 Thread Tom Lane
Jens-Wolfhard Schicke [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ERROR: database %s is being accessed by other users
 DETAIL: There are %d session(s) and %d prepared transaction(s) using the 
 database.
 
 I'm aware that this phrasing might not translate very nicely ... anyone
 have a suggestion for better wording?

 I can only estimate translation effort into German, but how about:

 DETAIL: Active users of the database: %d session(s), %d prepared 
 transaction(s)

Hmmm ... what I ended up committing was code that special-cased the
common cases where you only have one or the other, ie

/*
 * We don't worry about singular versus plural here, since the English
 * rules for that don't translate very well.  But we can at least avoid
 * the case of zero items.
 */
if (notherbackends  0  npreparedxacts  0)
errdetail(There are %d other session(s) and %d prepared transaction(s) 
using the database.,
  notherbackends, npreparedxacts);
else if (notherbackends  0)
errdetail(There are %d other session(s) using the database.,
  notherbackends);
else
errdetail(There are %d prepared transaction(s) using the database.,
  npreparedxacts);

Your proposal seems fine for the first case but a bit stilted for the
other two.  Or maybe that's just me.

Of course, we don't *have* to do it as above at all, if 0 prepared
transactions doesn't bother people.

Ideas anybody?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers