Re: [HACKERS] Mini improvement: statement_cost_limit
On Sun, 2008-08-03 at 22:57 -0400, Robert Treat wrote: I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. I advocate solutions to the problems of users I've worked with. My preference is to help people in the next release, then improve from there. We need to work with what we have. In this case, an existing solution has been found. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Initial Unsigned Integer data type performance test results.
Hello All, I wanted to pass some performance data on to the group regarding the unsigned integer data types I am working on. I tested on two systems running Ubuntu Hardy. The first system is an 8 x 2.66GHz x86-64 processor system. The second system is a 2 x 533 celeron i386 system. For this test I disabled autovaccuum and manually ran the vacuum during times that were not timed. I also added some sleep and sync commands to try and stabalize the test results. The integer and unsigned integer test results were also interleaved to compensate for other system activities. Finally, I have not done a lot of bench marking, so feedback and improvement ideas are welcome! I benchmarked four tests: 32-bit int2 vs uint2-- My expectation is these results would be approximately -- equal. The uint2 casts up to an int4 which is still 32-bits. 32-bit int4 vs uint4-- My expectation is operator should be approximately -- equal since I explicitly added int4 uint4 operators. 64-bit int2 vs uint2 -- My expectation is these results would be approximately -- equal. The uint2 casts up to an int4 which is less than -- the 64-bit processor bus width. 64-bit int4 vs uint4 -- My expectation is these results would be approximately -- equal. The uint4 casts up to an int8 which is the processor -- bus width. Here is the definition of each column (times are displayed in seconds): Rows-- The number of rows added to the table. Test -- The test number (I ran each benchmark 3 times). S Load -- Time to load the table with signed integers. U Load -- Time to load the table with unsigned integers. S OP -- Time to perform an operator on each row in the table (signed integers) U OP -- Time to perform an operator on each row in the table (unsigned integers) S OP -- Time to perform an AND operator on each row in the table (signed integers) U OP -- Time to perform an AND operator on each row in the table (unsigned integers) I still have some more investigating to do into the results of this data (i.e. why is the signed OP cheaper for int2 than uint2 types, etc), but I wanted to share this performance data with the community. Rough calculations show about a 4% performance hit for 32-bit processors to use the int8 type and a 6% performance hit for 64-bit processors. This seems like a reasonable trade-off for the reduced storage space I am after, and the reduction in operators Greg and Tom were concerned about. If there are no objections from the community, I will plan to complete my implementation of Tom's proposal at: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01204.php. Thanks! - Ryan P.S. The code I have so far is not ready for review. It was implemented quickly to get performance numbers to determine if Tom's proposal would be acceptable by the community. I can send it out if people are interested, but I am planning on submitting it to a future commit fest once I have it cleaned up and better tested. Benchmark Data: == 32-bit int2 vs. uint2 Rows Test S Load U Load S OP U OP S OP U OP -- 100 1 000.002888 000.002151 000.010881 000.014691 000.011124 000.011000 100 2 000.002780 000.002127 000.011729 000.011611 000.012014 000.011925 100 3 000.002747 000.002085 000.010193 000.010318 000.010588 000.010576 1000 1 000.003201 000.003870 000.037837 000.037360 000.032064 000.032478 1000 2 000.003259 000.003912 000.033495 000.036281 000.032502 000.035195 1000 3 000.003201 000.003913 000.039156 000.035592 000.032405 000.040543 1 1 000.024683 000.021306 000.255958 000.329045 000.255887 000.283782 1 2 000.020214 000.021224 000.260252 000.290933 000.281468 000.255171 1 3 000.020371 000.020940 000.276401 000.264791 000.257598 000.257258 10 1 001.669571 001.687523 002.591442 002.682428 003.410724 003.490362 10 2 001.682251 001.702598 003.379377 002.855622 002.549476 002.583431 10 3 001.693429 001.684732 002.546024 002.641240 002.540556 003.366534 50 1 010.138317 011.014532 015.707597 015.61 015.394598 015.502639 50 2 010.042176 010.179163 015.290994 015.407479 015.332925 016.321578 50 3 010.047930 010.206489 015.016276 015.430527 015.201759 015.411601 100 1 020.762680 022.145950 030.338606
Re: [HACKERS] unnecessary code in_bt_split
On Sun, 2008-08-03 at 19:44 -0400, Tom Lane wrote: Zdenek Kotala [EMAIL PROTECTED] writes: I found that _bt_split function calls PageGetTempPage, but next call is _bt_page_init which clear all contents anyway. Is there any reason to call PageGetTempPage instead of palloc? Not violating a perfectly good abstraction? I agree that PageGetTempPage isn't amazingly efficient, but internal refactoring would halve its cost; and if you have some evidence that there's a real performance issue then we could think about adjusting the temp-page API to allow _bt_pageinit to be combined with it. But I have a real problem with hacking up _bt_split so that it will call PageRestoreTempPage on something it didn't get from PageGetTempPage. Considering the WAL and regular I/O that will be induced by a split, I kinda doubt this is even worth worrying about anyway... Improving this should help, since the existing page is write locked during _bt_split. The I/O won't happen at the point that these blocks are critical contention points. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: ISTR that what ended up killing the enthusiasm for this was that most people realized that this GUC was just a poor tool to take a stab at solving other problems (ie. rate limiting cpu for queries). I'm not concerned with that, I want developers to have feed back on costs in a way that is obvious. I think a variation on this could be very useful in development and test environments. Suppose it raised a warning or notice if the cost was over the limit. Then one could set a limit of a few million on the development and test servers and developers would at least have a clue that they needed to look at explain for that query. As it is now, one can exhort them to run explain, but it has no effect. Instead we later see queries killed by a 24 hour timeout with estimated costs ranging from until they unplug the machine and dump it to until the sun turns into a red giant. Great argument. So that's 4 in favour at least. Not such a great argument. Cost models on development servers can and often are quite different from those on production, so you might be putting an artifical limit on top of your developers. We load the production dumps into our dev environment, which are the same hardware spec, so the costs should be identical. I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
Josh Berkus [EMAIL PROTECTED] writes: Tom, Wasn't this exact proposal discussed and rejected awhile back? We rejected Greenplum's much more invasive resource manager, because it created a large performance penalty on small queries whether or not it was turned on. However, I don't remember any rejection of an idea as simple as a cost limit rejection. The idea's certainly come up before. It probably received the usual non-committal cold shoulder rather than an outright rejection. This would, IMHO, be very useful for production instances of PostgreSQL. The penalty for mis-rejection of a poorly costed query is much lower than the penalty for having a bad query eat all your CPU. Well that's going to depend on the application But I suppose there's nothing wrong with having options which aren't always a good idea to use. The real question I guess is whether there's ever a situation where it would be a good idea to use this. I'm not 100% sure. What I would probably use myself is an option to print a warning before starting the query. That would be handy for interactive sessions so you would be able to hit C-c instead of waiting for several minutes and then wondering whether you got the query wrong. I wonder if it would be useful to have a flag on some GUC options to make them not globally settable. That is, for example, you could set enable_seqscan in an individual session but not in postgres.conf. Or perhaps again just print a warning that it's not recommended as a global configuration. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unnecessary code in_bt_split
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: I found that _bt_split function calls PageGetTempPage, but next call is _bt_page_init which clear all contents anyway. Is there any reason to call PageGetTempPage instead of palloc? Not violating a perfectly good abstraction? OK. Abstraction is nice, but what I see in the PageGetTempPage It is more like code which makes everything but usability is zero. It is used only in two places and in both it is used for different purpose. _bt_split() needs only allocate empty temp page and gistplacetopage() . By my opinion It would be better to have three functions: PageCreateTempPage - only allocate memory and call pageinit PageCloneSpecial - copy special section from source page PageRestoreTempPage - no change. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for pgstat.stat
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: It doesn't seem to me that it'd be hard to support two locations for the stats file --- it'd just take another parameter to the read and write routines. pgstat.c already knows the difference between a normal write and a shutdown write ... Right. Should it be removed from the permanent location when the server starts? Yes, I would say so. There are two possible exit paths: normal shutdown (where we'd write a new file) and crash. In a crash we'd wish to delete the file anyway for fear that it's corrupted. Startup: read permanent file, then delete it. Post-crash: remove any permanent file (same as now) Shutdown: write permanent file. Normal stats collector write: write temp file. Backend stats fetch: read temp file. Attached is a patch that implements this. I went with the option of just storing it in a temporary directory that can be symlinked, and not bothering with a GUC for it. Comments? (documentation updates are also needed, but I'll wait with those until I hear patch comments :-P) //Magnus Index: backend/postmaster/pgstat.c === RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.176 diff -c -r1.176 pgstat.c *** backend/postmaster/pgstat.c 30 Jun 2008 10:58:47 - 1.176 --- backend/postmaster/pgstat.c 4 Aug 2008 09:39:23 - *** *** 67,74 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_FILENAME global/pgstat.stat ! #define PGSTAT_STAT_TMPFILE global/pgstat.tmp /* -- * Timer definitions. --- 67,76 * Paths for the statistics files (relative to installation's $PGDATA). * -- */ ! #define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat ! #define PGSTAT_STAT_PERMANENT_TMPFILE global/pgstat.tmp ! #define PGSTAT_STAT_FILENAMEpgstat_tmp/pgstat.stat ! #define PGSTAT_STAT_TMPFILE pgstat_tmp/pgstat.tmp /* -- * Timer definitions. *** *** 218,225 static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); ! static void pgstat_write_statsfile(void); ! static HTAB *pgstat_read_statsfile(Oid onlydb); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); --- 220,227 static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); ! static void pgstat_write_statsfile(bool permanent); ! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent); static void backend_read_statsfile(void); static void pgstat_read_current_status(void); *** *** 509,514 --- 511,517 pgstat_reset_all(void) { unlink(PGSTAT_STAT_FILENAME); + unlink(PGSTAT_STAT_PERMANENT_FILENAME); } #ifdef EXEC_BACKEND *** *** 2595,2601 * zero. */ pgStatRunningInCollector = true; ! pgStatDBHash = pgstat_read_statsfile(InvalidOid); /* * Setup the descriptor set for select(2). Since only one bit in the set --- 2598,2604 * zero. */ pgStatRunningInCollector = true; ! pgStatDBHash = pgstat_read_statsfile(InvalidOid, true); /* * Setup the descriptor set for select(2). Since only one bit in the set *** *** 2635,2641 if (!PostmasterIsAlive(true)) break; ! pgstat_write_statsfile(); need_statwrite = false; need_timer = true; } --- 2638,2644 if (!PostmasterIsAlive(true)) break; ! pgstat_write_statsfile(false); need_statwrite = false; need_timer = true; } *** *** 2803,2809 /* * Save the final stats to reuse at next startup. */ ! pgstat_write_statsfile(); exit(0); } --- 2806,2812 /* * Save the final stats to reuse at next startup. */ ! pgstat_write_statsfile(true); exit(0); } *** *** 2891,2897 * -- */ static void ! pgstat_write_statsfile(void) { HASH_SEQ_STATUS hstat; HASH_SEQ_STATUS tstat; --- 2894,2900 * -- */ static void ! pgstat_write_statsfile(bool permanent) { HASH_SEQ_STATUS hstat; HASH_SEQ_STATUS tstat; *** *** 2901,2917 PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; /* * Open the statistics temp file to write out the current values. */ ! fpout = fopen(PGSTAT_STAT_TMPFILE, PG_BINARY_W); if (fpout == NULL) { ereport(LOG, (errcode_for_file_access(), errmsg(could not open temporary statistics file \%s\: %m, ! PGSTAT_STAT_TMPFILE))); return; } --- 2904,2922 PgStat_StatFuncEntry *funcentry; FILE *fpout; int32 format_id; + const char
[HACKERS] DROP DATABASE always seeing database in use
It seems there's something wrong with CheckOtherDBBackends() but I haven't exactly figured out what. There are no other sessions but drop database keeps saying regression is being accessed by other users. I do see Autovacuum touching tables in regression but CheckOtherDBBackends() is supposed to send it a sigkill if it finds it and it doesn't seem to be doing so. I've been hacking on unrelated stuff in this database and have caused multiple core dumps and autovacuum is finding orphaned temp tables. It's possible some state is corrupted in some way here but I don't see what. postgres=# select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+--+-+--+-+-+-+---+---+---+-+- 11505 | postgres |5616 | 10 | stark | select * from pg_stat_activity; | f | 2008-08-04 11:46:05.438479+01 | 2008-08-04 11:46:05.438956+01 | 2008-08-04 11:45:19.827702+01 | | -1 (1 row) postgres=# commit; COMMIT postgres=# drop database regression; ERROR: 55006: database regression is being accessed by other users LOCATION: dropdb, dbcommands.c:678 select * from pg_stat_activity; postgres=# datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+--+-+--+-+-+-+---+---+---+-+- 11505 | postgres |5616 | 10 | stark | select * from pg_stat_activity; | f | 2008-08-04 11:46:45.619642+01 | 2008-08-04 11:46:45.620115+01 | 2008-08-04 11:45:19.827702+01 | | -1 (1 row) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Gregory Stark wrote: It seems there's something wrong with CheckOtherDBBackends() but I haven't exactly figured out what. There are no other sessions but drop database keeps saying regression is being accessed by other users. I do see Autovacuum touching tables in regression but CheckOtherDBBackends() is supposed to send it a sigkill if it finds it and it doesn't seem to be doing so. I've been hacking on unrelated stuff in this database and have caused multiple core dumps and autovacuum is finding orphaned temp tables. It's possible some state is corrupted in some way here but I don't see what. The buildfarm would be all red if this wasn't something local to your installation, I think. Maybe you should get gdb on the backend and set a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to see why it isn't working. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_regress inputdir
Alvaro Herrera wrote: In my opinion, the need for running tests outside the test dir is not very strong (or we would have heard complaints before), and thus the solution is to remove --inputdir and --outputdir. Attached is a patch that removes --inputdir and --outputdir. I still prefere the first patch (that fixed my problem), but removing them is probably better than having them when they don't work. Tested with psql make check on solaris x86. -J -- Jørgen Austvik, Software Engineering - QA Sun Microsystems Database Group Index: src/test/regress/pg_regress_main.c === RCS file: /projects/cvsroot/pgsql/src/test/regress/pg_regress_main.c,v retrieving revision 1.3 diff -u -r1.3 pg_regress_main.c --- src/test/regress/pg_regress_main.c 1 Jan 2008 19:46:00 - 1.3 +++ src/test/regress/pg_regress_main.c 4 Aug 2008 11:19:04 - @@ -34,12 +34,9 @@ char expectfile[MAXPGPATH]; char psql_cmd[MAXPGPATH * 3]; - snprintf(infile, sizeof(infile), %s/sql/%s.sql, - inputdir, testname); - snprintf(outfile, sizeof(outfile), %s/results/%s.out, - outputdir, testname); - snprintf(expectfile, sizeof(expectfile), %s/expected/%s.out, - inputdir, testname); + snprintf(infile, sizeof(infile), sql/%s.sql, testname); + snprintf(outfile, sizeof(outfile), results/%s.out, testname); + snprintf(expectfile, sizeof(expectfile), expected/%s.out, testname); add_stringlist_item(resultfiles, outfile); add_stringlist_item(expectfiles, expectfile); Index: src/test/regress/pg_regress.c === RCS file: /projects/cvsroot/pgsql/src/test/regress/pg_regress.c,v retrieving revision 1.46 diff -u -r1.46 pg_regress.c --- src/test/regress/pg_regress.c 3 Aug 2008 05:12:38 - 1.46 +++ src/test/regress/pg_regress.c 4 Aug 2008 11:19:04 - @@ -32,6 +32,9 @@ #include getopt_long.h #include pg_config_paths.h +#define LOG_DIRECTORY log +#define RESULTS_DIRECTORY results + /* for resultmap we need a list of pairs of strings */ typedef struct _resultmap { @@ -68,8 +71,6 @@ /* options settable from command line */ _stringlist *dblist = NULL; bool debug = false; -char *inputdir = .; -char *outputdir = .; char *psqldir = NULL; static _stringlist *loadlanguage = NULL; static int max_connections = 0; @@ -560,8 +561,7 @@ FILE *f; /* scan the file ... */ - snprintf(buf, sizeof(buf), %s/resultmap, inputdir); - f = fopen(buf, r); + f = fopen(resultmap, r); if (!f) { /* OK if it doesn't exist, else complain */ @@ -1702,8 +1702,7 @@ FILE *difffile; /* create the log file (copy of running status output) */ - snprintf(file, sizeof(file), %s/regression.out, outputdir); - logfilename = strdup(file); + logfilename = regression.out; logfile = fopen(logfilename, w); if (!logfile) { @@ -1713,8 +1712,7 @@ } /* create the diffs file as empty */ - snprintf(file, sizeof(file), %s/regression.diffs, outputdir); - difffilename = strdup(file); + difffilename = regression.diffs; difffile = fopen(difffilename, w); if (!difffile) { @@ -1726,9 +1724,8 @@ fclose(difffile); /* also create the output directory if not present */ - snprintf(file, sizeof(file), %s/results, outputdir); - if (!directory_exists(file)) - make_directory(file); + if (!directory_exists(RESULTS_DIRECTORY)) + make_directory(RESULTS_DIRECTORY); } static void @@ -1799,14 +1796,12 @@ printf(_(Options:\n)); printf(_( --dbname=DB use database DB (default \regression\)\n)); printf(_( --debug turn on debug mode in programs that are run\n)); - printf(_( --inputdir=DIRtake input files from DIR (default \.\)\n)); printf(_( --load-language=lang load the named language before running the\n)); printf(_(tests; can appear multiple times\n)); printf(_( --create-role=ROLEcreate the specified role before testing\n)); printf(_( --max-connections=N maximum number of concurrent connections\n)); printf(_((default is 0 meaning unlimited)\n)); printf(_( --multibyte=ENCODING use ENCODING as the multibyte encoding\n)); - printf(_( --outputdir=DIR place output files in DIR (default \.\)\n)); printf(_( --schedule=FILE use test ordering schedule from FILE\n)); printf(_((can be used multiple times to concatenate)\n)); printf(_( --srcdir=DIR absolute path to source directory (for VPATH builds)\n)); @@ -1844,11 +1839,9 @@ {version, no_argument, NULL, 'V'}, {dbname, required_argument, NULL, 1}, {debug, no_argument, NULL, 2}, - {inputdir, required_argument, NULL, 3}, {load-language, required_argument, NULL, 4}, {max-connections, required_argument, NULL, 5}, {multibyte, required_argument, NULL, 6}, - {outputdir, required_argument, NULL, 7}, {schedule, required_argument, NULL, 8},
Re: [HACKERS] DROP DATABASE always seeing database in use
On Mon, Aug 04, 2008 at 11:51:35AM +0100, Gregory Stark wrote: It seems there's something wrong with CheckOtherDBBackends() but I haven't exactly figured out what. There are no other sessions but drop database keeps saying regression is being accessed by other users. Are any prepared transactions still open? select * from pg_prepared_xacts; -- Michael Fuhr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unnecessary code in_bt_split
Zdenek Kotala [EMAIL PROTECTED] writes: Tom Lane napsal(a): Not violating a perfectly good abstraction? By my opinion It would be better to have three functions: PageCreateTempPage - only allocate memory and call pageinit PageCloneSpecial - copy special section from source page PageRestoreTempPage - no change. That naming still breaks the association of TempPage functions. If we're going to have multiple temp-page-creation functions, I think their names should follow a pattern like PageGetTempPageXXX. After looking around a bit, I'm not entirely convinced that there's *any* call for the existing definition of PageGetTempPage :-(. There are only two callers: _bt_split() which certainly doesn't need it to work the way it does, and gistplacetopage() which might or might not be just as happy initializing all of the page special space for itself. Oleg, Teodor, could you comment on whether it's really needed to copy the old page's special space there? Also, to the extent that PageGetTempPage copies the source page's header instead of setting it up from scratch, I think it's outright *wrong*. This will result in copying the source's pd_flags and pd_prune_xid, neither of which seems like correct behavior given that we're clearing the page contents. I'm thinking we should split PageGetTempPage into two versions: PageGetTempPage: get a temp page the same size as the given page, but don't initialize its contents at all (so, just a thin wrapper for palloc). This could be used by _bt_split, as well as GinPageGetCopyPage and GistPageGetCopyPage. PageGetTempPageCopySpecial: get a temp page, PageInit it, and copy the special space from the given page. The only customer for this is gistplacetopage(), so maybe we don't even want it, rather than just doing the work right in gistplacetopage()? You could also make an argument for PageGetTempPageCopy() which'd just copy the source page verbatim, thus replacing GinPageGetCopyPage and GistPageGetCopyPage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Joshua D. Drake [EMAIL PROTECTED] writes: Alvaro Herrera wrote: (I think it's better to reuse the same postmaster executable, because that way it's easier to have the same parsing routines.) Change that to pg_ctl and you have a deal :) Did you not understand Alvaro's point? Putting this functionality into pg_ctl will result in huge code bloat, because it will have to duplicate a lot of code that already exists inside the postgres executable. I don't object to having a pg_ctl option to call postgres --check, but I do object to maintaining two copies of the same code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Michael Fuhr [EMAIL PROTECTED] writes: On Mon, Aug 04, 2008 at 11:51:35AM +0100, Gregory Stark wrote: It seems there's something wrong with CheckOtherDBBackends() but I haven't exactly figured out what. There are no other sessions but drop database keeps saying regression is being accessed by other users. Are any prepared transactions still open? Uh, yes, I did notice that but didn't put two and two together. That does make sense now that you mention it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Alvaro Herrera [EMAIL PROTECTED] writes: The buildfarm would be all red if this wasn't something local to your installation, I think. Maybe you should get gdb on the backend and set a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to see why it isn't working. Michael Fuhr solved it so this is academic but, the buildfarm runs make installcheck? I thought it just ran make check -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Magnus Hagander [EMAIL PROTECTED] writes: Tom Lane wrote: Seems a lot better to me to just train people to run the check-config code by hand before pulling the trigger to load the settings for real. I think it'd be reasonable to refuse starting if the config is *known broken* (such as containing lines that are unparseable, or that contain completely invalid tokens), whereas you'd start if they just contain things that are probably wrong. But picking from your previous examples of more advanced checks, there are lots of cases where things like overlapping CIDR address ranges are perfectly valid, so I don't think we could even throw a warning for that - unless there's a separate flag to enable/disable warnings for such a thing. There are cases that are sane, and there are cases that are not. You've got three possibilities: * two lines referencing the exact same address range (and other selectors such as user/database). Definitely a mistake, because the second one is unreachable. * two lines where the second's address range is a subset of the first (and other stuff is the same). Likewise a mistake. * two lines where the first's address range is a subset of the second's. This one is the only sane one. (The nature of CIDR notation is that there are no partial overlaps, so it must be one of these three cases.) We have in fact seen complaints from people who apparently missed the fact that pg_hba.conf entries are order-sensitive, so I think a test like this would be worth making. But it shouldn't be done by the postmaster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: The buildfarm would be all red if this wasn't something local to your installation, I think. Maybe you should get gdb on the backend and set a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to see why it isn't working. Michael Fuhr solved it so this is academic but, the buildfarm runs make installcheck? I thought it just ran make check Hmm, I kinda assumed that it ran drop database regression at some point, but maybe you are right that it doesn't ... I do run make installcheck all the time though, so at least I would have noticed ;-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Gregory Stark wrote: Alvaro Herrera [EMAIL PROTECTED] writes: The buildfarm would be all red if this wasn't something local to your installation, I think. Maybe you should get gdb on the backend and set a breakpoint on errfinish, or maybe step into CheckOtherDBBackends to see why it isn't working. Michael Fuhr solved it so this is academic but, the buildfarm runs make installcheck? I thought it just ran make check It runs both. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Location for pgstat.stat
Magnus Hagander [EMAIL PROTECTED] writes: Attached is a patch that implements this. I went with the option of just storing it in a temporary directory that can be symlinked, and not bothering with a GUC for it. Comments? (documentation updates are also needed, but I'll wait with those until I hear patch comments :-P) Looks alright in a fast once-over (I didn't test it). Two comments: Treating the directory as something to create in initdb means you'll need to bump catversion when you apply it. I'm not sure where you are planning to document, but there should at least be a mention in the database physical layout chapter, since that's supposed to enumerate all the subdirectories of $PGDATA. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Andrew Dunstan [EMAIL PROTECTED] writes: Gregory Stark wrote: Michael Fuhr solved it so this is academic but, the buildfarm runs make installcheck? I thought it just ran make check It runs both. It also runs contrib installcheck, which will most definitely exercise DROP DATABASE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Gregory Stark [EMAIL PROTECTED] writes: Michael Fuhr [EMAIL PROTECTED] writes: Are any prepared transactions still open? Uh, yes, I did notice that but didn't put two and two together. That does make sense now that you mention it. I've been bit by that too, and so have other people. Maybe it'd be worth the trouble to improve the message so that it explicitly tells you when there are prepared transactions blocking the DROP. Another possibility is to let the DROP automatically roll back the conflicting prepared xacts, but that seems a bit dangerous. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Michael Fuhr [EMAIL PROTECTED] writes: Are any prepared transactions still open? Uh, yes, I did notice that but didn't put two and two together. That does make sense now that you mention it. I've been bit by that too, and so have other people. Maybe it'd be worth the trouble to improve the message so that it explicitly tells you when there are prepared transactions blocking the DROP. Yes, that should be easy enough. Another possibility is to let the DROP automatically roll back the conflicting prepared xacts, but that seems a bit dangerous. Yeah, the prepared xact might have modified shared catalogs, for example. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I've been bit by that too, and so have other people. Maybe it'd be worth the trouble to improve the message so that it explicitly tells you when there are prepared transactions blocking the DROP. Yes, that should be easy enough. I looked at this quickly and decided that we can do it with some small changes to CheckOtherDBBackends(). I propose counting the number of conflicting PGPROCs and adding a DETAIL line to the existing error message: ERROR: database %s is being accessed by other users DETAIL: There are %d session(s) and %d prepared transaction(s) using the database. I'm aware that this phrasing might not translate very nicely ... anyone have a suggestion for better wording? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PL/PythonU
Hi All: This is an off-shoot of the Do we really want to migrate plproxy and citext into PG core distribution? thread. On the way home from PyOhio, I had a conversation with a few people that use Zope a lot. I happened to mention that Postgres doesn't have an untrusted version of pl/python and they were curious as to why. They directed me to Zope's Restricted Python implementation [1][2]. In doing some research, I found the Pl/Python -- current maintainer? [3] thread from 2006. I also found this [4] thread on the python-dev mailing list. Hannu: You had mentioned bringing pl/python up to the level of some of the other pl's. Have you thought any more about pl/pythonu? David Blewett -- A few quotes from the python-dev thread (links at bottom): Here is some context for Python-Dev. RestrictedPython is a custom Python compiler that, when combined with a restricted environment, provides a sandbox safe enough to allow partly-trusted people to write and execute scripts on a Zope server. It has been used in Zope 2 for a long time and will have a future in Zope 3. The sandbox is more extensive than what the rexec module provides. The safety of RestrictedPython has been validated in a somewhat formal process with Python 2.4. Ranjith is working to validate it with Python 2.5. -- Shane Hathaway [5] No, he means the restricted Python compiler and capability-proxy system used by Zope. You know, the one I always bring up whenever anybody says they want to implement capabilities in Python? ;-) Zope's restricted Python is basically a combination of a special compiler, __builtin__ replacements, and a proxy type. Instead of using LOAD_ATTR opcodes, the compiler generates code that calls a special getattr() function instead, and most objects other than relatively-safe builtin types are wrapped in proxies that control what attributes can be accessed and what operations can be performed. The restricted Python framework itself doesn't impose any particular security policy; proxies delegate checks to checker objects that are essentially capabilities. Mostly, it focuses on creating a safe sandbox that can be expanded. -- Phillip J. Eby [6] 1. http://svn.zope.org/RestrictedPython/trunk/ 2. http://svn.zope.org/zope.security/trunk/src/zope/security/untrustedpython/ 3. http://archives.postgresql.org/pgsql-hackers/2006-02/msg01036.php 4. http://mail.python.org/pipermail/python-dev/2008-July/081340.html 5. http://mail.python.org/pipermail/python-dev/2008-July/081412.html 6. http://mail.python.org/pipermail/python-dev/2008-July/081347.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Strawberry Perl?
Howdy, I noticed this in the weekly news: Magnus Hagander committed: - In pgsql/doc/src/sgml/install-win32.sgml, document which versions of ActivePerl and ActiveTcl are required for building on MSVC, and that the free distribution is enough (no need for the enterprise version). Per gripe from Martin Zaun. I was just wondering whether or not it also worked with Strawberry Perl, the Perl community's own binary distribution of Perl on Windows? http://strawberryperl.com/ Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] \ef function in psql
On Jul 31, 2008, at 00:07, Abhijit Menon-Sen wrote: I have attached two patches: - funcdef.diff implements pg_get_functiondef() - edit.diff implements \ef function in psql based on (1). Comments appreciated. +1 I like! The ability to easily edit a function on the fly in psql will be very welcome to DBAs I know. And I like the pg_get_functiondef() function, too, a that will simplify editing existing functions in other admin apps, like pgAdmin. I'm starting to get really excited for 8.4. I can haz cheezburger? Oops, I mean, when does it ship? ;-P Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Type Categories for User-Defined Types
On Jul 31, 2008, at 10:42, David E. Wheeler wrote: Good point --- so new members of STRING category aren't going to be that common, except for domains which apparently aren't bothering people anyway. I'll go ahead and make the change. (I think it's just a trivial change in find_coercion_pathway, and everything else should Just Work. If it turns out not to be trivial maybe we should reconsider.) Wow. Really nice, Tom. Thanks! The attached patch has all the tests I added to my svn version against 8.3, and for which I had to write 60 additional cast functions. Ping! Just wanted to make sure this wasn't lost in the shuffle… Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Type Categories for User-Defined Types
David E. Wheeler wrote: On Jul 31, 2008, at 10:42, David E. Wheeler wrote: Wow. Really nice, Tom. Thanks! The attached patch has all the tests I added to my svn version against 8.3, and for which I had to write 60 additional cast functions. Ping! Just wanted to make sure this wasn't lost in the shuffle… Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09 -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/PythonU
On Mon, 2008-08-04 at 13:08 -0400, David Blewett wrote: Hi All: This is an off-shoot of the Do we really want to migrate plproxy and citext into PG core distribution? thread. On the way home from PyOhio, I had a conversation with a few people that use Zope a lot. I happened to mention that Postgres doesn't have an untrusted version of pl/python and they were curious as to why. They directed me to Zope's Restricted Python implementation [1][2]. In doing some research, I found the Pl/Python -- current maintainer? [3] thread from 2006. I also found this [4] thread on the python-dev mailing list. Hannu: You had mentioned bringing pl/python up to the level of some of the other pl's. Have you thought any more about pl/pythonu? My recollection of old times (about python v. 1.6) was that the restricted sandboxes had some fatal flaws. I have not followed zope's RestrictedPython enough to have an opinion on its safety. David Blewett -- A few quotes from the python-dev thread (links at bottom): Here is some context for Python-Dev. RestrictedPython is a custom Python compiler that, when combined with a restricted environment, provides a sandbox safe enough to allow partly-trusted people to write and execute scripts on a Zope server. It has been used in Zope 2 for a long time and will have a future in Zope 3. The sandbox is more extensive than what the rexec module provides. The safety of RestrictedPython has been validated in a somewhat formal process with Python 2.4. Ranjith is working to validate it with Python 2.5. -- Shane Hathaway [5] Will definitely look at it. Thanks. No, he means the restricted Python compiler and capability-proxy system used by Zope. You know, the one I always bring up whenever anybody says they want to implement capabilities in Python? ;-) Zope's restricted Python is basically a combination of a special compiler, __builtin__ replacements, and a proxy type. Instead of using LOAD_ATTR opcodes, the compiler generates code that calls a special getattr() function instead, and most objects other than relatively-safe builtin types are wrapped in proxies that control what attributes can be accessed and what operations can be performed. The restricted Python framework itself doesn't impose any particular security policy; proxies delegate checks to checker objects that are essentially capabilities. Mostly, it focuses on creating a safe sandbox that can be expanded. -- Phillip J. Eby [6] 1. http://svn.zope.org/RestrictedPython/trunk/ 2. http://svn.zope.org/zope.security/trunk/src/zope/security/untrustedpython/ 3. http://archives.postgresql.org/pgsql-hackers/2006-02/msg01036.php 4. http://mail.python.org/pipermail/python-dev/2008-July/081340.html 5. http://mail.python.org/pipermail/python-dev/2008-July/081412.html 6. http://mail.python.org/pipermail/python-dev/2008-July/081347.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Type Categories for User-Defined Types
On Aug 4, 2008, at 11:02, Alvaro Herrera wrote: Ping! Just wanted to make sure this wasn't lost in the shuffle… Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09 Sure, although it's a simple refinement (read: tests, mainly) of an accepted July patch, submitted before the end of July. I can put it in for Sept, but thought it might be best to deal with it while the original patch is still fresh in (Tom's) mind and since it's something Tom asked me for. :-) Anyway, added to the 09 commitfest page. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Type Categories for User-Defined Types
David E. Wheeler wrote: On Aug 4, 2008, at 11:02, Alvaro Herrera wrote: Ping! Just wanted to make sure this wasn't lost in the shuffle… Please add it here: http://wiki.postgresql.org/wiki/CommitFest:2008-09 Sure, although it's a simple refinement (read: tests, mainly) of an accepted July patch, submitted before the end of July. I can put it in for Sept, but thought it might be best to deal with it while the original patch is still fresh in (Tom's) mind and since it's something Tom asked me for. :-) Anyway, added to the 09 commitfest page. There is nothing to prevent such things from being committed ahead of that commitfest. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Monday 04 August 2008 03:50:40 daveg wrote: On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: ISTR that what ended up killing the enthusiasm for this was that most people realized that this GUC was just a poor tool to take a stab at solving other problems (ie. rate limiting cpu for queries). I'm not concerned with that, I want developers to have feed back on costs in a way that is obvious. That was one of the other use cases that was pushed forward in the past. I think a variation on this could be very useful in development and test environments. Suppose it raised a warning or notice if the cost was over the limit. Then one could set a limit of a few million on the development and test servers and developers would at least have a clue that they needed to look at explain for that query. As it is now, one can exhort them to run explain, but it has no effect. Instead we later see queries killed by a 24 hour timeout with estimated costs ranging from until they unplug the machine and dump it to until the sun turns into a red giant. Great argument. So that's 4 in favour at least. Not such a great argument. Cost models on development servers can and often are quite different from those on production, so you might be putting an artifical limit on top of your developers. We load the production dumps into our dev environment, which are the same hardware spec, so the costs should be identical. That's great for you, I am talking in the scope of a general solution. (Note I'd also bet that even given the same hardware, different production loads can produce different relative mappings of cost vs. performance, but whatever) I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. And you'll note, I specifically said that a crude tool is better than nothing. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
Greg, Well that's going to depend on the application But I suppose there's nothing wrong with having options which aren't always a good idea to use. The real question I guess is whether there's ever a situation where it would be a good idea to use this. I'm not 100% sure. I can think of *lots*. Primarily, simple web applications, where queries are never supposed to take more than 50ms. If a query turns up with an estimated cost of 100, then you know something's wrong; in the statistics if not in the query. In either case, that query has a good chance of dragging down the whole system. In such a production application, it is better to have false positives and reject otherwise-OK queries becuase their costing is wrong, than to let a single cartesian join bog down an application serving 5000 simultaneous users. Further, with a SQL error, this would allow the query rejection to be handled in a user-friendly way from the UI (Search too complex. Try changing search terms.) rather than timing out, which is very difficult to handle well. The usefulness of this feature for interactive sessions is limited-to-nonexistant. It's for production applications. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python
On Mon, Aug 4, 2008 at 1:56 PM, Hannu Krosing [EMAIL PROTECTED] wrote: Hannu: You had mentioned bringing pl/python up to the level of some of the other pl's. Have you thought any more about pl/pythonu? Obviously, I meant pl/python. Subject line fixed to. Sorry for the noise. David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, 4 Aug 2008, daveg wrote: On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: Not such a great argument. Cost models on development servers can and often are quite different from those on production, so you might be putting an artifical limit on top of your developers. We load the production dumps into our dev environment, which are the same hardware spec, so the costs should be identical. Not identical, just close. ANALYZE samples data from your table randomly. The statistics used to compute the costs will therefore be slightly different on the two servers even if the data is the same. The problem of discovering one plan on production and another on development is not quite that easy to remove. Ultimately, if your developers aren't thorough enough to do thinks like look at EXPLAIN plans enough to discover things that are just bad, I just chuckle at your thinking that putting a single limiter on their bad behavior will somehow magically make that better. Anyway, if your production server is small enough that you can afford to have another one just like it for the developers to work on, that's great. Robert's point is that many installs don't work like that. The development teams in lots of places only get a subset of the production data because it's too large to deploy on anything but a big server, which often is hard to cost justify buying just for development purposes. I like the concept of a cost limit, but I'm a bit horrified by the thought of it being exposed simply through the internal cost numbers because they are so arbitrary. One of the endless projects I think about but never start coding is to write something that measures the things the planner cost constants estimate on a particular machine, so that all those numbers actually can be tied to some real-world time measure. If you did that, you'd actually have a shot at accomplishing the real goal here, making statement_cost_limit cut off statements expected to take longer than statement_timeout before they even get started. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python
On Mon, 2008-08-04 at 15:02 -0400, David Blewett wrote: On Mon, Aug 4, 2008 at 1:56 PM, Hannu Krosing [EMAIL PROTECTED] wrote: Hannu: You had mentioned bringing pl/python up to the level of some of the other pl's. Have you thought any more about pl/pythonu? Obviously, I meant pl/python. Subject line fixed to. Sorry for the noise. sure, it's just the other side of the coin ;) David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest July Over
Hackers, Well, after a month the July CommitFest is officially closed. At this point, we're operating with the defacto rule that commitfests shouldn't last more than a month. Because some patches are still being discussed, they've been moved over automatically to the September commitfest. A much large number of patches are now in returned with feedback; if your patch is in there, probably hackers is waiting for some kind of response from you. Lots of stuff was committed, too. 8.4 is looking very exciting. Post-mortem things we've learned about the commitfest are: 1) It's hard to get anything done in June-July. 2) The number of patches is going to keep increasing with each commitfest. As such, the patch list is going to get harder to deal with. We now urgently need to start working on CF management software. 3) Round Robin Reviewers didn't really work this time, aside from champion new reviewer Abhjit. For the most part, RRR who were assigned patches did not review them for 2 weeks. Two areas where this concept needs to be improved: a) we need to assign RRR to patches two days after the start of commitfest, not a week later; b) there needs to be the expectation that RRR will start reviewing or reject the assignment immediately. 4) We need to work better to train up new reviewers. Some major committer(s) should have worked with Abhjit, Thomas and Martin particularly on getting them to effectively review patches; instead, committers just handled stuff *for* them for the most part, which isn't growing our pool of reviewers. 5) Patch submitters need to understand that patch submission isn't fire-and-forget. They need to check back, and respond to queries from reviewers. Of course, a patch-tracker which automatically notified the submitter would help. 6) Overall, I took a low-nag-factor approach to the first time as commitfest manager. This does not seem to have been the best way; I'd suggest for september that the manager make more frequent nags. Finally: who wants to be CF Manager for September? I'm willing to do it again, but maybe someone else should get a turn. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, Aug 04, 2008 at 03:09:34PM -0400, Greg Smith wrote: On Mon, 4 Aug 2008, daveg wrote: We load the production dumps into our dev environment, which are the same hardware spec, so the costs should be identical. Not identical, just close. ANALYZE samples data from your table randomly. The statistics used to compute the costs will therefore be slightly different on the two servers even if the data is the same. The problem of discovering one plan on production and another on development is not quite that easy to remove. Ultimately, if your developers aren't thorough enough to do thinks like look at EXPLAIN plans enough to discover things that are just bad, I just chuckle at your thinking that putting a single limiter on their bad behavior will somehow magically make that better. Not all developers can be persuaded to run explain on every change. However, many will investigate a new message. I'm only hoping to try to focus their attention toward possible problem queries. Anyway, if your production server is small enough that you can afford to have another one just like it for the developers to work on, that's great. Robert's point is that many installs don't work like that. The development teams in lots of places only get a subset of the production data because it's too large to deploy on anything but a big server, which often is hard to cost justify buying just for development purposes. Not to get into a size war ;-), but the production environment I'd like this feature for is over 40 32GB 16 scsi drive quadcore boxes. These are dedicated to postgresql and run one or just a few databases. There are also a bunch of client boxes that we will not speak of. The staging and test environments are similar hardware but have only a subset of the databases copied to them. There are probably than a dozen DB hosts for that. I like the concept of a cost limit, but I'm a bit horrified by the thought of it being exposed simply through the internal cost numbers because they are so arbitrary. One of the endless projects I think about but never start coding is to write something that measures the things the planner cost constants estimate on a particular machine, so that all those numbers actually can be tied to some real-world time measure. If you did that, you'd actually have a shot at accomplishing the real goal here, making statement_cost_limit cut off statements expected to take longer than statement_timeout before they even get started. That is a nice idea. Possibly it could be a utility like the fsync tester. But planner estimates are never going to be all that accurate even with solid cost numbers because for some classes of queries, particularly those with many joins the stats can be good at each level but the error accumulates exponentially. Which is why I think a warning is appropriate instead of an error. Even a notice in the logs would be useful. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote: On Monday 04 August 2008 03:50:40 daveg wrote: That's great for you, I am talking in the scope of a general solution. (Note I'd also bet that even given the same hardware, different production loads can produce different relative mappings of cost vs. performance, but whatever) Even on different hardware it would still likely warn of mistakes like products due to missing join conditions etc. I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. And you'll note, I specifically said that a crude tool is better than nothing. I released somewhat after I sent the above that it might have sounded a bit snippy. I hope I have not offended. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. I'm suggesting a warning, or even just a notice into the logs, I don't see the footgun. What am I missing? Regards -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: On Monday 04 August 2008 03:50:40 daveg wrote: On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: ... I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. And you'll note, I specifically said that a crude tool is better than nothing. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. On the other other hand, _anything_ can end up as a foot-gun in hands of ingenious users. I was once told about a company, who claimed to have produced a positively fool-proof lawn-mower, only to find out, that a university professor had tried to use it to trim a hedge and cut off his toes. - Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
Josh Berkus [EMAIL PROTECTED] writes: In such a production application, it is better to have false positives and reject otherwise-OK queries becuase their costing is wrong, than to let a single cartesian join bog down an application serving 5000 simultaneous users. Further, with a SQL error, this would allow the query rejection to be handled in a user-friendly way from the UI (Search too complex. Try changing search terms.) rather than timing out, which is very difficult to handle well. The usefulness of this feature for interactive sessions is limited-to-nonexistant. It's for production applications. Wow. I couldn't disagree more. For such an application this would be a major foot-gun which would give a false sense of security simultaneously causing random outages and not providing even the protection you're counting on. It would be quite likely to miss some cartesian joins and allow problematic queries through randomly and block other perfectly legitimate queries. I's no substitute for writing your search engine query generator to actually check that it has enough constraints to avoid any disallowed cartesion joins. That people might think it's reliable enough to use for such applications is my major concern and if my guess is right, Tom's as well. I suspect you may have just sunk any chance of getting him on-side. Where I see it useful is a) during development when it might help catch erroneous queries as a kind of sql-lint. and b) when running ad-hoc DBA queries where it might let the DBA catch the error before letting it run for a while. I'm sure I'm not the only DBA who let a query run for 5 minutes before wondering if it should really be taking that long. I would be much more comfortable if it produced a warning, not an error. And much more if we implemented my previous thought of having some settings which generate warnings if they're set at startup saying that's not recommended. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, Aug 04, 2008 at 11:59:03AM -0700, Josh Berkus wrote: Greg, Well that's going to depend on the application But I suppose there's nothing wrong with having options which aren't always a good idea to use. The real question I guess is whether there's ever a situation where it would be a good idea to use this. I'm not 100% sure. I can think of *lots*. Primarily, simple web applications, where queries are never supposed to take more than 50ms. If a query turns up with an estimated cost of 100, then you know something's wrong; in the statistics if not in the query. In either case, that query has a good chance of dragging down the whole system. In such a production application, it is better to have false positives and reject otherwise-OK queries becuase their costing is wrong, than to let a single cartesian join bog down an application serving 5000 simultaneous users. Further, with a SQL error, this would allow the query rejection to be handled in a user-friendly way from the UI (Search too complex. Try changing search terms.) rather than timing out, which is very difficult to handle well. The usefulness of this feature for interactive sessions is limited-to-nonexistant. It's for production applications. Ok, that is a different use case where an error seems very useful. What about slightly extending the proposal to have the severity of exceeding the limit configurable too. Something like: costestimate_limit = 10 # default 0 to ignore limit costestimate_limit_severity = error # debug, notice, warning, error -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
Hannu Krosing [EMAIL PROTECTED] wrote: I was once told about a company, who claimed to have produced a positively fool-proof lawn-mower, only to find out, that a university professor had tried to use it to trim a hedge and cut off his toes. Odd. Seriously, about 45 years ago I lived next door to a university botany professor who cut off his fingertips that way. I wonder if professors are more prone to this or whether the story got mangled over time. Perhaps software should have special protection for professors -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: On Monday 04 August 2008 03:50:40 daveg wrote: And you'll note, I specifically said that a crude tool is better than nothing. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. The proposal is for an option with no consequences when turned off. We respect your right not to use it. What is the danger exactly? If we cancel stupid queries before people run them, everybody is a winner. Even the person who submitted the stupid query, since they find out faster. Sure, its an estimate, but it's got to be a based upon an estimate if it acts *before* it runs. And surely there is no better estimate of the cost than the plan cost? It doesn't stop anyone from putting in resource limits, later. We'll have to do something with enable_seqscan, BTW, chaps. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
Greg, For such an application this would be a major foot-gun which would give a false sense of security simultaneously causing random outages and not providing even the protection you're counting on. Hmmm. That sounds like a call for some testing. While our cost estimation has some issues, I don't think it's unreliable as all that. And it's easy enough to document the limitations. If it's 80% accurate, then it's fixing more problems than it causes. If it's 30% accurate, then obviously it's a bad idea. Personally, I don't see much use for this interactively, because an experienced DBA can easily enough run an EXPLAIN before running the query. I usually do, on production systems. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Automatic Client Failover
When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. We might want to specify that centrally and then send the redirection address to the client when it connects. Sounds like lots of work though. Seems fairly straightforward to specify a standby connection service at client level: .pgreconnect, or pgreconnect.conf No config, then option not used. Would work with various forms of replication. Implementation would be to make PQreset() try secondary connection if the primary one fails to reset. Of course you can program this manually, but the feature is that you wouldn't need to, nor would you need to request changes to 27 different interfaces either. Good? Bad? Ugly? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Monday 04 August 2008 16:49:43 Simon Riggs wrote: On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: On Monday 04 August 2008 03:50:40 daveg wrote: And you'll note, I specifically said that a crude tool is better than nothing. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. The proposal is for an option with no consequences when turned off. We respect your right not to use it. What is the danger exactly? All of the proposals I have seen for adding query hints would also have no consequence if not used, but no one seems to care about that argument. :-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs [EMAIL PROTECTED] wrote: When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. This would be a nice feature which many people I've talked to have asked for. In Oracle-land, it's called Transparent Application Failover (TAF) and it gives you a lot of options, including the ability to write your own callbacks when a failover is detected. +1 -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Monday 04 August 2008 15:56:25 daveg wrote: On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote: On Monday 04 August 2008 03:50:40 daveg wrote: That's great for you, I am talking in the scope of a general solution. (Note I'd also bet that even given the same hardware, different production loads can produce different relative mappings of cost vs. performance, but whatever) Even on different hardware it would still likely warn of mistakes like products due to missing join conditions etc. See, this is what we ended up talking about before. Someone will say I'd like to prevent my devs from accidentally doing queries with cartesian products and they will use this to do it... but that will only work in some cases, so it becomes a poor tool to solve a different problem. BTW, what I really love about statement costs, is that they aren't even reliable on the same machine with the same data. I have seen query plans which run on the same data on the same machine where the resultant query runtime can vary from 2 hours to 5 hours, depending on how much other concurrent traffic is on the machine. Awesome eh? I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. And you'll note, I specifically said that a crude tool is better than nothing. I released somewhat after I sent the above that it might have sounded a bit snippy. I hope I have not offended. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. I'm suggesting a warning, or even just a notice into the logs, I don't see the footgun. What am I missing? The footgun in my mind is that people will think this solves a number of problems even though it doesnt solve them well. However, the footgun for you might be that the current proposal will actually abort the query, not emit a warning (not sure if that changes your opinion of it). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
On Monday 04 August 2008 14:08, Simon Riggs wrote: When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. We might want to specify that centrally and then send the redirection address to the client when it connects. Sounds like lots of work though. Seems fairly straightforward to specify a standby connection service at client level: .pgreconnect, or pgreconnect.conf No config, then option not used. Well, it's less simple, but you can already do this with pgPool on the client machine. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus [EMAIL PROTECTED] wrote: Well, it's less simple, but you can already do this with pgPool on the client machine. Yeah, but if you have tens or hundreds of clients, you wouldn't want to be installing/managing a pgpool on each. Similarly, I think an application should have the option of being notified of a connection change; I know that wasn't in Simon's proposal, but I've found it necessary in several applications which rely on things such as temporary tables. You don't want the app just blowing up because a table doesn't exist; you want to be able to handle it gracefully. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IN vs EXISTS equivalence
On Mon, Oct 22, 2007 at 1:30 PM, Simon Riggs wrote: On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote: I've requested this before without response, but I'm asking again because it just caused me pain again: could we get a TODO added to have the planner recognize equivalent IN and EXISTS constructs and have them compete on cost estimates? I know it's not a trivial improvement, but if it's on the list maybe someone will pick it up, and I see it as the single biggest weakness in PostgreSQL performance. I'll pick it up as a default unless someone requests they have it from me. Since Simon's focus has shifted to other issues, I'm hoping this can go onto the TODO list. I'm adding some NOT EXISTS examples to the thread for completeness of what someone might want to address while working on it. For two queries which can easily be shown (to a human viewer, anyway) to return identical results, I see performance differences of over five orders of magnitude. (Six if you compare to the LEFT JOIN ... WHERE not_null_right_column IS NULL trick.) Below are the cost estimates of the three techniques for a medium-sized table joining to a large table, and for a large table joining to a small table. The IN behavior has the worst worst-case behavior, at least in queries that I've run, although many people report that it is usually faster. The technique of doing an existence test with a LEFT JOIN and then checking whether a NOT NULL column from the right-hand table is null is often faster than either technique, and seldom much worse than the best technique for any given test. Queries and plans attached. Summary of costs below, in millions of cost units. (Fractions of a million discarded.) NOT IN (independent_subquery) 19745843, 5 WHERE NOT EXISTS 74, 318 LEFT JOIN WHERE not_null_right_column IS NULL 10, 17 These cost estimates tend to come out in pretty consistent ratio to the actual run times. -Kevin cir= explain cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, A.jdgmtHistSeqNo cir- FROM Jdgmt A cir- WHERE A.jdgmtHistSeqNo IS NOT NULL cir- AND (A.countyNo, A.caseNo, A.jdgmtHistSeqNo) NOT IN (SELECT B.countyNo, B.caseNo, B.histSeqNo FROM CaseHist B) cir- ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo cir- ; QUERY PLAN -- Index Scan using Jdgmt_pkey on Jdgmt A (cost=4003123.55..19745843977029.96 rows=6896890 width=24) Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan))) SubPlan - Materialize (cost=4003123.55..6253176.10 rows=167689505 width=20) - Seq Scan on CaseHist B (cost=0.00..3262276.55 rows=167689505 width=20) (5 rows) cir= cir= explain cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, A.jdgmtHistSeqNo cir- FROM Jdgmt A cir- WHERE A.jdgmtHistSeqNo IS NOT NULL cir- AND NOT EXISTS cir- ( cir( SELECT * FROM CaseHist B cir( WHERE B.countyNo = A.countyNo cir( AND B.caseNo = A.caseNo cir( AND B.histSeqNo = A.jdgmtHistSeqNo cir( ) cir- ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo cir- ; QUERY PLAN -- Index Scan using Jdgmt_pkey on Jdgmt A (cost=0.00..74966880.92 rows=6896890 width=24) Filter: ((jdgmtHistSeqNo IS NOT NULL) AND (NOT (subplan))) SubPlan - Index Scan using CaseHist_pkey on CaseHist B (cost=0.00..10.07 rows=2 width=323) Index Cond: (((countyNo)::smallint = ($0)::smallint) AND ((caseNo)::text = ($1)::text) AND ((histSeqNo)::smallint = ($2)::smallint)) (5 rows) cir= cir= explain cir- SELECT A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo, A.jdgmtHistSeqNo cir- FROM Jdgmt A cir- LEFT JOIN CaseHist B cir- ON ( B.countyNo = A.countyNo cir( AND B.caseNo = A.caseNo cir( AND B.histSeqNo = A.jdgmtHistSeqNo cir() cir- WHERE A.jdgmtHistSeqNo IS NOT NULL cir- AND B.countyNo IS NULL cir- ORDER BY countyNo, caseNo, chargeSeqNo, jdgmtSeqNo cir- ; QUERY PLAN --- Sort (cost=10616941.83..10645732.70 rows=11516349 width=24) Sort Key: A.countyNo, A.caseNo, A.chargeSeqNo, A.jdgmtSeqNo - Merge Right Join (cost=2020174.37..9175701.57 rows=11516349 width=24) Merge Cond: (((B.countyNo)::smallint = inner.?column6?) AND ((B.caseNo)::text = inner.?column7?) AND
Re: [HACKERS] Automatic Client Failover
Jonah H. Harris [EMAIL PROTECTED] writes: On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus [EMAIL PROTECTED] wrote: Well, it's less simple, but you can already do this with pgPool on the client machine. Yeah, but if you have tens or hundreds of clients, you wouldn't want to be installing/managing a pgpool on each. Huh? The pgpool is on the server, not on the client side. There is one really bad consequence of the oversimplified failover design that Simon proposes, which is that clients might try to fail over for reasons other than a primary server failure. (Think network partition.) You really want any such behavior to be managed centrally, IMHO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
On Mon, 2008-08-04 at 22:08 +0100, Simon Riggs wrote: When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. We might want to specify that centrally and then send the redirection address to the client when it connects. Sounds like lots of work though. One way to do it is _outside_ of client, by having a separately managed subnet for logical DB addresses. So when a failover occurs, then you move that logical DB address to the new host, flush ARP caches and just reconnect. This also solves the case of inadvertent failover in case of unrelated network failure. Seems fairly straightforward to specify a standby connection service at client level: .pgreconnect, or pgreconnect.conf No config, then option not used. Would work with various forms of replication. Implementation would be to make PQreset() try secondary connection if the primary one fails to reset. Of course you can program this manually, but the feature is that you wouldn't need to, nor would you need to request changes to 27 different interfaces either. Good? Bad? Ugly? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Le 5 août 08 à 01:13, Tom Lane a écrit : There is one really bad consequence of the oversimplified failover design that Simon proposes, which is that clients might try to fail over for reasons other than a primary server failure. (Think network partition.) You really want any such behavior to be managed centrally, IMHO. Then, what about having pgbouncer capability into -core. This would probably mean, AFAIUI, than the listen()ing process would no longer be postmaster but a specialized one, with the portable poll()/ select()/... process, that is now know as pgbouncer. Existing pgbouncer would have to be expanded to: - provide a backward compatible mode (session pooling, release server session at client closing time) - allow to configure several backend servers and to try next on certain conditions - add hooks for clients to know when some events happen (failure of current master, automatic switchover, etc) Existing pgbouncer hooks and next ones could be managed with catalog tables as we have special options table for autovacuum, e.g., pg_connection_pool, which could contain arbitrary SQL for new backend fork, backend closing, failover, switchover, etc; and maybe the client hooks would be NOTIFY messages sent from the backend at its initiative. Would we then have the centrally managed behavior Tom is mentioning? I'm understanding this in 2 ways: - this extension would be able to distinguish between failure cases where we are able to do an automatic failover from hard crashes (impacting the listener) - when we have read-only slave(s) pgbouncer will be able to redirect ro statements to it. Maybe it would even be useful to see about Markus' work in Postgres-R and its inter-backend communication system allowing the executor to require more than one backend working on a single query. The pgbouncer inherited system would then be a pre-forked backend pooling manager too... Once more, I hope that giving (not so) random ideas here as a (not yet) pgsql hacker is helping the project more than it's disturbing real work... Regards, - -- dim -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkiXk5gACgkQlBXRlnbh1bkBhACfQdgHh27yGeyHgeCrC7aV1LET U4IAn1N6FaanI2BEWMLyPWKmGtedaSQC =ifVF -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IN vs EXISTS equivalence
Kevin Grittner [EMAIL PROTECTED] writes: I'm adding some NOT EXISTS examples to the thread for completeness of what someone might want to address while working on it. For two queries which can easily be shown (to a human viewer, anyway) to return identical results, I see performance differences of over five orders of magnitude. Could we see EXPLAIN ANALYZE not just EXPLAIN for these? When people are complaining of bad planner behavior, I don't find bare EXPLAIN output to be very convincing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
Dimitri Fontaine [EMAIL PROTECTED] writes: Le 5 août 08 à 01:13, Tom Lane a écrit : There is one really bad consequence of the oversimplified failover design that Simon proposes, which is that clients might try to fail over for reasons other than a primary server failure. (Think network partition.) You really want any such behavior to be managed centrally, IMHO. Then, what about having pgbouncer capability into -core. This would probably mean, AFAIUI, than the listen()ing process would no longer be postmaster but a specialized one, Huh? The problem case is that the primary server goes down, which would certainly mean that a pgbouncer instance on the same machine goes with it. So it seems to me that integrating pgbouncer is 100% backwards. Failover that actually works is not something we can provide with trivial changes to Postgres. It's really a major project in its own right: you need heartbeat detection, STONITH capability, IP address redirection, etc. I think we should be recommending external failover-management project(s) instead of offering a half-baked home-grown solution. Searching freshmeat for failover finds plenty of potential candidates, but not having used any of them I'm not sure which are worth closer investigation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
Tom, Failover that actually works is not something we can provide with trivial changes to Postgres. I think the proposal was for an extremely simple works 75% of the time failover solution. While I can see the attraction of that, the consequences of having failover *not* work are pretty severe. On the other hand, we will need to deal with this for the built-in replication project. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: ERROR: database %s is being accessed by other users DETAIL: There are %d session(s) and %d prepared transaction(s) using the database. I'm aware that this phrasing might not translate very nicely ... anyone have a suggestion for better wording? I can only estimate translation effort into German, but how about: DETAIL: Active users of the database: %d session(s), %d prepared transaction(s) Jens -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIl6G4zhchXT4RR5ARAh7BAJ4vGKx0f/1aycXOfJZmkOAg1fe2IgCgpXVe HF9CSX3bSZI/eO4GB3xSrdc= =Ogzl -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
On Mon, Aug 04, 2008 at 05:17:59PM -0400, Jonah H. Harris wrote: On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs [EMAIL PROTECTED] wrote: When primary server fails, it would be good if the clients connected to the primary knew to reconnect to the standby servers automatically. This would be a nice feature which many people I've talked to have asked for. In Oracle-land, it's called Transparent Application Failover (TAF) and it gives you a lot of options, including the ability to write your own callbacks when a failover is detected. This might be better done as part of a proxy server, eg pgbouncer, pgpool than as part of postgresql or libpq. I like the concept, but the logic to determine when a failover has occurred is complex and a client will often not have access to enough information to make this determination accurately. postgresql could have hooks to support this though, ie to determine when a standby thinks it has become the master. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mini improvement: statement_cost_limit
On Mon, Aug 04, 2008 at 05:19:50PM -0400, Robert Treat wrote: See, this is what we ended up talking about before. Someone will say I'd like to prevent my devs from accidentally doing queries with cartesian products and they will use this to do it... but that will only work in some cases, so it becomes a poor tool to solve a different problem. BTW, what I really love about statement costs, is that they aren't even reliable on the same machine with the same data. I have seen query plans which run on the same data on the same machine where the resultant query runtime can vary from 2 hours to 5 hours, depending on how much other concurrent traffic is on the machine. Awesome eh? Sure, I don't think anyone believes that costs are precise. But the case that is interesting is 2 hours versus years and years. The footgun in my mind is that people will think this solves a number of problems even though it doesnt solve them well. However, the footgun for yo I suspect that a good solution to this problem is impossible as it is more or less the halting problem. So I'm willing to accept a poor solution based on costs and then hope we improve the cost model. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest July Over
On Monday 04 August 2008 15:38:35 Josh Berkus wrote: Hackers, Well, after a month the July CommitFest is officially closed. At this point, we're operating with the defacto rule that commitfests shouldn't last more than a month. Because some patches are still being discussed, they've been moved over automatically to the September commitfest. A much large number of patches are now in returned with feedback; if your patch is in there, probably hackers is waiting for some kind of response from you. People should understand they don't have to wait for a commitfest to continue development, right? (Ie. if your patch got rejected, start getting it in shape now, and ask questions now) Lots of stuff was committed, too. 8.4 is looking very exciting. +1 Post-mortem things we've learned about the commitfest are: 1) It's hard to get anything done in June-July. True... vacations and conferences abound. September should be better in this regard I would think. 2) The number of patches is going to keep increasing with each commitfest. As such, the patch list is going to get harder to deal with. We now urgently need to start working on CF management software. 3) Round Robin Reviewers didn't really work this time, aside from champion new reviewer Abhjit. For the most part, RRR who were assigned patches did not review them for 2 weeks. Two areas where this concept needs to be improved: a) we need to assign RRR to patches two days after the start of commitfest, not a week later; This seems tricky, since you want people to volunteer to review patches ideally, will two days be enough? Should people interested in reviewing be signing up ahead of time? Looking at the next commitfest, it is going to start on a Monday... maybe auto-assigning reviewers on Wednesday is OK. b) there needs to be the expectation that RRR will start reviewing or reject the assignment immediately. I wonder if too much time was spent on patches like the WITH patch, which seemed pretty early on it was not ready for commit... thoughts? 4) We need to work better to train up new reviewers. Some major committer(s) should have worked with Abhjit, Thomas and Martin particularly on getting them to effectively review patches; instead, committers just handled stuff *for* them for the most part, which isn't growing our pool of reviewers. 5) Patch submitters need to understand that patch submission isn't fire-and-forget. They need to check back, and respond to queries from reviewers. Of course, a patch-tracker which automatically notified the submitter would help. Reviewers should be responding to the email on -hackers that is pointed to by the wiki, so patch submitters should be getting notified... right ? 6) Overall, I took a low-nag-factor approach to the first time as commitfest manager. This does not seem to have been the best way; I'd suggest for september that the manager make more frequent nags. Is there something you want people to nag people about? Finally: who wants to be CF Manager for September? I'm willing to do it again, but maybe someone else should get a turn. Why stop now when you've got the momentum? :-) Seriously though, I thought we were supposed to have 2 people working as CF Managers for each CF... is that not the case? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic Client Failover
Josh Berkus [EMAIL PROTECTED] writes: I think the proposal was for an extremely simple works 75% of the time failover solution. While I can see the attraction of that, the consequences of having failover *not* work are pretty severe. Exactly. The point of failover (or any other HA feature) is to get several nines worth of reliability. It usually works is simply not playing in the right league. On the other hand, we will need to deal with this for the built-in replication project. Nope, that's orthogonal. A failover solution depends on having a master and a slave database, but it has nothing directly to do with how those DBs are synchronized. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP DATABASE always seeing database in use
Jens-Wolfhard Schicke [EMAIL PROTECTED] writes: Tom Lane wrote: ERROR: database %s is being accessed by other users DETAIL: There are %d session(s) and %d prepared transaction(s) using the database. I'm aware that this phrasing might not translate very nicely ... anyone have a suggestion for better wording? I can only estimate translation effort into German, but how about: DETAIL: Active users of the database: %d session(s), %d prepared transaction(s) Hmmm ... what I ended up committing was code that special-cased the common cases where you only have one or the other, ie /* * We don't worry about singular versus plural here, since the English * rules for that don't translate very well. But we can at least avoid * the case of zero items. */ if (notherbackends 0 npreparedxacts 0) errdetail(There are %d other session(s) and %d prepared transaction(s) using the database., notherbackends, npreparedxacts); else if (notherbackends 0) errdetail(There are %d other session(s) using the database., notherbackends); else errdetail(There are %d prepared transaction(s) using the database., npreparedxacts); Your proposal seems fine for the first case but a bit stilted for the other two. Or maybe that's just me. Of course, we don't *have* to do it as above at all, if 0 prepared transactions doesn't bother people. Ideas anybody? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers