[GENERAL] unique indices without pg_constraint rows
I've been looking at unique indices in a PostgreSQL 8.3.x cluster. Some unique indices clearly have a corresponding row in pg_constraint, while other unique indices appear to have no corresponding row in pg_constraint at all. Why is this? What determines if a unique index will also have a row in pg_constraint? Thanks in advance. Ed
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 15:41, Ed Loehr (LoehrTech.com) wrote: Ed L. pg...@bluepolka.net writes: (gdb) bt #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00346f8689cb in getc () from /lib64/libc.so.6 #4 0x00531ee8 in next_token (fp=0x5b90f20, buf=0x7fff59bef330 , bufsz=4096) at hba.c:128 #5 0x00532233 in tokenize_file (filename=0x5b8f3f0 global, file=0x5b90f20, lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at hba.c:232 #6 0x005322e9 in tokenize_file (filename=0x5b8f3d0 global/pg_auth, file=0x5b90ce0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x005327ff in load_role () at hba.c:959 #8 0x0057f300 in reaper (postgres_signal_arg=value optimized out) at postmaster.c:2145 #9 signal handler called #10 0x00346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x0057cc33 in ServerLoop () at postmaster.c:1236 #12 0x0057dfdf in PostmasterMain (argc=6, argv=0x5b73fe0) at postmaster.c:1031 #13 0x005373de in main (argc=6, argv=value optimized out) at main.c:188 The postmaster seems to be stuck trying to read $PGDATA/global/pg_auth (which would be an expected thing for it to do at this point in the startup sequence). Does that file exist? Is it an ordinary file? Do its contents look sane (a list of your userids and their passwords and group memberships)? This just happened again ~24 hours after full reload from backup. Arrrgh. Backtrace looks the same again, same file, same __read_nocancel(). $PGDATA/global/pg_auth looks fine to me, permissions are 600, entries are 3 or more double-quoted items per line each separated by a space, items 3 and beyond being groups. Any clues? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 15:46, Ed L. wrote: On Monday 01 March 2010 @ 15:41, Ed Loehr (LoehrTech.com) wrote: Ed L. pg...@bluepolka.net writes: (gdb) bt #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00346f8689cb in getc () from /lib64/libc.so.6 #4 0x00531ee8 in next_token (fp=0x5b90f20, buf=0x7fff59bef330 , bufsz=4096) at hba.c:128 #5 0x00532233 in tokenize_file (filename=0x5b8f3f0 global, file=0x5b90f20, lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at hba.c:232 #6 0x005322e9 in tokenize_file (filename=0x5b8f3d0 global/pg_auth, file=0x5b90ce0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x005327ff in load_role () at hba.c:959 #8 0x0057f300 in reaper (postgres_signal_arg=value optimized out) at postmaster.c:2145 #9 signal handler called #10 0x00346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x0057cc33 in ServerLoop () at postmaster.c:1236 #12 0x0057dfdf in PostmasterMain (argc=6, argv=0x5b73fe0) at postmaster.c:1031 #13 0x005373de in main (argc=6, argv=value optimized out) at main.c:188 The postmaster seems to be stuck trying to read $PGDATA/global/pg_auth (which would be an expected thing for it to do at this point in the startup sequence). Does that file exist? Is it an ordinary file? Do its contents look sane (a list of your userids and their passwords and group memberships)? This just happened again ~24 hours after full reload from backup. Arrrgh. Backtrace looks the same again, same file, same __read_nocancel(). $PGDATA/global/pg_auth looks fine to me, permissions are 600, entries are 3 or more double-quoted items per line each separated by a space, items 3 and beyond being groups. Any clues? Watching the server logs, the system is continuing to process data on existing connections. Just can't get any new ones. Here's a backtrace for a hung psql -c select version(): $ gdb `which psql` GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) attach 9631 Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/psql, process 9631 Reading symbols from /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5...done. Loaded symbols for /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5 Reading symbols from /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /usr/lib64/libreadline.so.5...done. Loaded symbols for /usr/lib64/libreadline.so.5 Reading symbols from /lib64/libtermcap.so.2...done. Loaded symbols for /lib64/libtermcap.so.2 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 0x00346f8c92af in poll () from /lib64/libc.so.6 (gdb) bt #0 0x00346f8c92af in poll () from /lib64/libc.so.6 #1 0x2b03826e5e6f in pqSocketCheck (conn=0x655eef0, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1046 #2 0x2b03826e5f10 in pqWaitTimed (forRead=1, forWrite=-1, conn=0x655eef0, finish_time=-1) at fe-misc.c:920 #3 0x2b03826e1752 in connectDBComplete (conn=0x655eef0) at fe-connect.c:930 #4 0x2b03826e2c60 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0, pgtty=0x0, dbName=0x0, login=0x0, pwd=0x0) at fe-connect.c:678 #5 0x0040e319 in main (argc=value optimized out, argv=0x7fff283ce6e8) at startup.c:195 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 15:59, Ed L. wrote: This just happened again ~24 hours after full reload from backup. Arrrgh. Backtrace looks the same again, same file, same __read_nocancel(). $PGDATA/global/pg_auth looks fine to me, permissions are 600, entries are 3 or more double-quoted items per line each separated by a space, items 3 and beyond being groups. Any clues? Also seeing lots of postmaster zombies (190 and growing)... Ed -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 16:03, Ed L. wrote: On Monday 01 March 2010 @ 15:59, Ed L. wrote: This just happened again ~24 hours after full reload from backup. Arrrgh. Backtrace looks the same again, same file, same __read_nocancel(). $PGDATA/global/pg_auth looks fine to me, permissions are 600, entries are 3 or more double-quoted items per line each separated by a space, items 3 and beyond being groups. Any clues? Also seeing lots of postmaster zombies (190 and growing)... While new connections are hanging, top shows postmaster using 100% of cpu. SIGTERM/SIGQUIT do nothing. Here's a backtrace of this busy postmaster: (gdb) bt #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00346f8689cb in getc () from /lib64/libc.so.6 #4 0x00531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 , bufsz=4096) at hba.c:128 #5 0x00532233 in tokenize_file (filename=0x10359b70 global, file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232 #6 0x005322e9 in tokenize_file (filename=0x2b1c8cbf5800 global/pg_auth, file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x005327ff in load_role () at hba.c:959 #8 0x0057f878 in sigusr1_handler (postgres_signal_arg=value optimized out) at postmaster.c:3830 #9 signal handler called #10 0x00346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x0057cc33 in ServerLoop () at postmaster.c:1236 #12 0x0057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at postmaster.c:1031 #13 0x005373de in main (argc=6, argv=value optimized out) at main.c:188 ...and more from the server logs, fwiw: 2010-03-01 17:30:24.213 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:30:31.250 CST [32236]DEBUG: transaction log switch forced (archive_timeout=300) 2010-03-01 17:31:24.216 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:32:24.219 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:33:24.222 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:34:24.225 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:35:19.061 CST [32236]LOG: checkpoint starting: time 2010-03-01 17:35:19.185 CST [32236]DEBUG: recycled transaction log file 0001001C0071 2010-03-01 17:35:19.185 CST [32236]LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=0.028 s, sync=0.000 s, total=0.124 s 2010-03-01 17:35:24.328 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:35:31.224 CST [32236]DEBUG: transaction log switch forced (archive_timeout=300) 2010-03-01 17:36:44.332 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:37:44.434 CST [32238]WARNING: worker took too long to start; cancelled 2010-03-01 17:37:47.378 CST [3692] dba 10(42816) dba LOG: could not receive data from client: Connection timed out 2010-03-01 17:37:47.378 CST [3692] dba 10(42816) dba LOG: unexpected EOF on client connection 2010-03-01 17:37:47.380 CST [3692] dba 10(42816) dba LOG: disconnection: session time: 2:11:15.303 user=dba database=dba host=... port=428 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 16:57, Tom Lane wrote: Now that I look more closely at those line numbers, it looks like the thing thinks it is processing an include file. Are there any @ signs in your global/pg_auth file? Yes, indeed, there are many. My user names are u...@host form, and have been for years. Would that be a problem now? Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:15, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 16:57, Tom Lane wrote: Now that I look more closely at those line numbers, it looks like the thing thinks it is processing an include file. Are there any @ signs in your global/pg_auth file? Yes, indeed, there are many. My user names are u...@host form, and have been for years. Would that be a problem now? u...@host shouldn't be a problem, but if there were an @ by itself or starting a token, it might possibly cause something like this. There is one, looks like a typo got in. How do I fix it? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 16:49, Tom Lane wrote: Oh, for some reason I thought it was sitting idle. That sounds more like an infinite loop. Try reattaching to the postmaster, confirm the stack trace, and then see how many times you can do fin before it doesn't return control. That will tell us which level of subroutine is looping. Yeah, it took me a bit to notice it was at 100% cpu. Here's gdb output with the 'fin's you suggested: $ gdb `which postgres` GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) attach 21670 Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/postgres, process 21670 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 (gdb) bt #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00346f8689cb in getc () from /lib64/libc.so.6 #4 0x00531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 , bufsz=4096) at hba.c:128 #5 0x00532233 in tokenize_file (filename=0x11da5940 global, file=0x11da7f30, lines=0x7fff5cb9e598, line_nums=0x7fff5cb9e590) at hba.c:232 #6 0x005322e9 in tokenize_file (filename=0x11da5920 global/pg_auth, file=0x11da7cf0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x005327ff in load_role () at hba.c:959 #8 0x0057f300 in reaper (postgres_signal_arg=value optimized out) at postmaster.c:2145 #9 signal handler called #10 0x00346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x0057cc33 in ServerLoop () at postmaster.c:1236 #12 0x0057dfdf in PostmasterMain (argc=6, argv=0x11d8afb0) at postmaster.c:1031 #13 0x005373de in main (argc=6, argv=value optimized out) at main.c:188 (gdb) fin Run till exit from #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 (gdb) fin Run till exit from #0 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 (gdb) fin Run till exit from #0 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 0x00346f8689cb in getc () from /lib64/libc.so.6 (gdb) fin Run till exit from #0 0x00346f8689cb in getc () from /lib64/libc.so.6 0x00531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 , bufsz=value optimized out) at hba.c:128 128 while ((c = getc(fp)) != EOF (pg_isblank(c) || c == ',')) (gdb) fin Run till exit from #0 0x00531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 , bufsz=value optimized out) at hba.c:128 0x00532233 in tokenize_file (filename=0x11da5940 global, file=0x11da7f30, lines=0x7fff5cb9e598, line_nums=0x7fff5cb9e590) at hba.c:232 232 if (!next_token(file, buf, sizeof(buf))) Value returned is $1 = 0 '\0' (gdb) fin Run till exit from #0 0x00532233 in tokenize_file (filename=0x11da5940 global, file=0x11da7f30, lines=0x7fff5cb9e598, line_nums=0x7fff5cb9e590) at hba.c:232 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:18, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 17:15, Tom Lane wrote: u...@host shouldn't be a problem, but if there were an @ by itself or starting a token, it might possibly cause something like this. There is one, looks like a typo got in. How do I fix it? Hmm, a user named @, or what? Yes, a bogus user: @ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:23, Ed L. wrote: On Monday 01 March 2010 @ 17:18, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 17:15, Tom Lane wrote: u...@host shouldn't be a problem, but if there were an @ by itself or starting a token, it might possibly cause something like this. There is one, looks like a typo got in. How do I fix it? Hmm, a user named @, or what? Yes, a bogus user: @ Correction. Here's the line: @ agent_group -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:25, Ed L. wrote: On Monday 01 March 2010 @ 17:23, Ed L. wrote: On Monday 01 March 2010 @ 17:18, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 17:15, Tom Lane wrote: u...@host shouldn't be a problem, but if there were an @ by itself or starting a token, it might possibly cause something like this. There is one, looks like a typo got in. How do I fix it? Hmm, a user named @, or what? Yes, a bogus user: @ Correction. Here's the line: @ agent_group It is the first line in the pg_auth file. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:26, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 17:18, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: There is one, looks like a typo got in. How do I fix it? Hmm, a user named @, or what? Yes, a bogus user: @ Mph. We really ought to fix things so that a quoted @ doesn't get taken as an include file reference. Odd that it's never come up before. Anyway, if you still have any open superuser sessions, the best thing would be an ALTER USER RENAME. If you don't, you'll have to resort to manually editing the pg_auth file, and then rename the user as soon as you can get in. Killed the stuck postmaster with sigkill, edited the file, restarted postmaster, and it re-wrote the file with the bogus entry. I don't have any superuser sessions open. Is there another route? Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:36, Ed L. wrote: On Monday 01 March 2010 @ 17:26, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: On Monday 01 March 2010 @ 17:18, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: There is one, looks like a typo got in. How do I fix it? Hmm, a user named @, or what? Yes, a bogus user: @ Mph. We really ought to fix things so that a quoted @ doesn't get taken as an include file reference. Odd that it's never come up before. Anyway, if you still have any open superuser sessions, the best thing would be an ALTER USER RENAME. If you don't, you'll have to resort to manually editing the pg_auth file, and then rename the user as soon as you can get in. Killed the stuck postmaster with sigkill, edited the file, restarted postmaster, and it re-wrote the file with the bogus entry. I don't have any superuser sessions open. Is there another route? I have source code and can patch and rebuild/reinstall if I had the right patch. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[SOLVED] Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:58, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: Killed the stuck postmaster with sigkill, edited the file, restarted postmaster, and it re-wrote the file with the bogus entry. I don't have any superuser sessions open. Is there another route? What you're going to need to do is stop the postmaster, start a standalone backend (see the postgres man page entry if you never did that before) and issue the ALTER USER in the standalone backend. Then you can restart normal operations. That did the trick. Thank you very much, Sensei. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 17:57, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: Correction. Here's the line: @ agent_group It is the first line in the pg_auth file. BTW, there seems to be some other contributing factor here besides the weird username, because I don't see any looping when I try CREATE USER @. What's your platform exactly, and what type of filesystem is $PGDATA on? This is CentOS 5.2, Linux 2.6.18-92.1.22.el5 #1 SMP x86_64 GNU/Linux. I think the PGDATA filesystem is LVM running on a RAID 5 setup, but not 100% sure. # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 267422192 132412984 121205688 53% / ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SOLVED] Re: [GENERAL] Hung postmaster (8.3.9)
On Monday 01 March 2010 @ 18:29, Tom Lane wrote: Ed L. pg...@bluepolka.net writes: That did the trick. Thank you very much, Sensei. I'd still like to know about platform etc. I see that we shouldn't be allowing a username to trigger @-file expansion, but even with that it's not clear how it turned into an infinite loop; and as I said, I can't reproduce that here. The original problem occurred on that CentOS box, but I just reproduced it on a new cluster on Ubuntu running LVM but no RAID. Here's the transcript: e...@duke:~/test$ uname -a Linux duke 2.6.31-15-generic #50-Ubuntu SMP Tue Nov 10 14:53:52 UTC 2009 x86_64 GNU/Linux e...@duke:~/test$ export PGPORT=9000 e...@duke:~/test$ w^C e...@duke:~/test$ svi e...@duke:~/test$ export PGDATA=`pwd`/data e...@duke:~/test$ initdb --locale=C -D $PGDATA The files belonging to this database system will be owned by user ed. This user must also own the server process. The database cluster will be initialized with locale C. The default database encoding has accordingly been set to SQL_ASCII. The default text search configuration will be set to english. creating directory /home/ed/test/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 32MB/204800 creating configuration files ... ok creating template1 database in /home/ed/test/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling trust authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: postgres -D /home/ed/test/data or pg_ctl -D /home/ed/test/data -l logfile start e...@duke:~/test$ pg_ctl -D /home/ed/test/data -l logfile start server starting e...@duke:~/test$ psql -c select version() psql: FATAL: database ed does not exist e...@duke:~/test$ create^C e...@duke:~/test$ createdb e...@duke:~/test$ psql -c select version() version - PostgreSQL 8.3.9 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1 (1 row) e...@duke:~/test$ createuser @ with login createuser: too many command-line arguments (first is with) Try createuser --help for more information. e...@duke:~/test$ psql -c create user \@\ with login CREATE ROLE e...@duke:~/test$ cat data/global/ 1136 1214 1260 2396 2672 2694 2698 2844 2847 pg_database 1137 1232 1261 2397 2676 2695 2842 2845 pg_auth pgstat.stat 1213 1233 1262 2671 2677 2697 2843 2846 pg_control e...@duke:~/test$ cat data/global/pg_auth @ ed e...@duke:~/test$ pg_ctl -D $PGDATA stop waiting for server to shut down...^C e...@duke:~/test$ ^C e...@duke:~/test$ ^C e...@duke:~/test$ ^C e...@duke:~/test$ pg_ctl -D $PGDATA -m fast stop waiting for server to shut down.^C e...@duke:~/test$ ^C e...@duke:~/test$ ps augxwwwf | grep postma ed8419 0.0 0.0 7336 868 pts/4S+ 18:36 0:00 \_ grep postma ohsdba6115 0.0 0.3 503692 28028 pts/0S18:03 0:00 postmaster -D /users/ohsdba/dbclusters/ohs/postgresql-8.3.x/data -i -p 9001 e...@duke:~/test$ gdb `which postgre^C e...@duke:~/test$ pg_ctl -D /home/ed/test/data -l logfile start pg_ctl: another server might be running; trying to start server anyway pg_ctl: could not start server Examine the log output. e...@duke:~/test$ cat logfile LOG: database system was shut down at 2010-03-01 18:34:25 MST LOG: autovacuum launcher started LOG: database system is ready to accept connections FATAL: database ed does not exist WARNING: worker took too long to start; cancelled FATAL: lock file postmaster.pid already exists HINT: Is another postmaster (PID 8203) running in data directory /home/ed/test/data? e...@duke:~/test$ ps augxwwwf | grep 8203 ed8465 0.0 0.0 7336 868 pts/4S+ 18:37 0:00 \_ grep 8203 ed8203 73.7 0.0 63080 4248 pts/4R18:34 2:07 /opt/pgsql/installs/postgresql-8.3.9/bin/postgres -D /home/ed/test/data e...@duke:~/test$ gdb `which postgres` GNU gdb (GDB) 7.0-ubuntu Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
[GENERAL] Hung postmaster (8.3.9)
Need some help. My PostgreSQL 8.3.6 and now 8.3.9 postmaster is hanging. No idea why, been running like a top for a year. Can't do select version(), even hung after system reboot. SIGINT/QUIT/TERM have no effect, only SIGKILL can stop it. This is Linux 2.6.18-92.1.22.el5 SMP x86_64 Here's the log after restarting from SIGKILL: 2010-02-27 20:11:10.426 CST [23134]LOG: database system was interrupted; last known up at 2010-02-27 20:08:46 CST 2010-02-27 20:11:10.426 CST [23134]DEBUG: checkpoint record is at 8BC/63F5FC0 2010-02-27 20:11:10.426 CST [23134]DEBUG: redo record is at 8BC/63F5FC0; shutdown TRUE 2010-02-27 20:11:10.426 CST [23134]DEBUG: next transaction ID: 0/488483782; next OID: 1619873999 2010-02-27 20:11:10.426 CST [23134]DEBUG: next MultiXactId: 130401682; next MultiXactOffset: 830996786 2010-02-27 20:11:10.426 CST [23134]LOG: database system was not properly shut down; automatic recovery in progress 2010-02-27 20:11:10.497 CST [23134]LOG: record with zero length at 8BC/63F6020 2010-02-27 20:11:10.497 CST [23134]LOG: redo is not required 2010-02-27 20:11:10.503 CST [23134]LOG: checkpoint starting: shutdown immediate 2010-02-27 20:11:10.581 CST [23134]LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.045 s, sync=0.000 s, total=0.083 s 2010-02-27 20:11:10.587 CST [23134]DEBUG: transaction ID wrap limit is 2435542396, limited by database ohsdba 2010-02-27 20:11:10.589 CST [23134]DEBUG: proc_exit(0) 2010-02-27 20:11:10.589 CST [23134]DEBUG: shmem_exit(0) 2010-02-27 20:11:10.592 CST [23134]DEBUG: exit(0) Here's the backtrace: $ gdb `which postgres` GNU gdb Fedora (6.8-37.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) attach 3776 Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/postgres, process 3776 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 (gdb) p debug_query_string $1 = 0x0 (gdb) bt #0 0x00346f8c43a0 in __read_nocancel () from /lib64/libc.so.6 #1 0x00346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x00346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x00346f8689cb in getc () from /lib64/libc.so.6 #4 0x00531ee8 in next_token (fp=0x5b90f20, buf=0x7fff59bef330 , bufsz=4096) at hba.c:128 #5 0x00532233 in tokenize_file (filename=0x5b8f3f0 global, file=0x5b90f20, lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at hba.c:232 #6 0x005322e9 in tokenize_file (filename=0x5b8f3d0 global/pg_auth, file=0x5b90ce0, lines=0x98b168, line_nums=0x98b170) at hba.c:358 #7 0x005327ff in load_role () at hba.c:959 #8 0x0057f300 in reaper (postgres_signal_arg=value optimized out) at postmaster.c:2145 #9 signal handler called #10 0x00346f8cb323 in __select_nocancel () from /lib64/libc.so.6 #11 0x0057cc33 in ServerLoop () at postmaster.c:1236 #12 0x0057dfdf in PostmasterMain (argc=6, argv=0x5b73fe0) at postmaster.c:1031 #13 0x005373de in main (argc=6, argv=value optimized out) at main.c:188 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /opt/pgsql/installs/postgresql-8.3.9/bin/postgres, process 3776 Thanks in advance for any help. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hung postmaster (8.3.9)
On Saturday 27 February 2010 @ 22:18, Greg Smith wrote: Ed L. wrote: 2010-02-27 20:11:10.426 CST [23134]LOG: database system was not properly shut down; automatic recovery in progress 2010-02-27 20:11:10.497 CST [23134]LOG: record with zero length at 8BC/63F6020 2010-02-27 20:11:10.497 CST [23134]LOG: redo is not required If the server is hung at this point, it was doing something: recovery from a crash, and during that period it will not respond to statements until it's finished as you noted. That can take some time, could run into the minutes. And if you kill it, next time the process starts all over again having made no progress. I'd try starting the server again, confirm it's only after this point it pauses at, watching if it's doing something with top -c, and grab some backtraces of what it's doing then. In this sample you gave, the shutdown was too fast for the server have done very recovery before it was killed off. It may just be you need to let it sit there for a while to finish recovery and then it will be fine again, can't tell from the data you've provided so far. Good point from my example, but it hung like that for 8 hours, and seems to be ready to do so again. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] commit performance anomaly
I've been tracking the performance of our DB query statements across a number of fairly high-volume pg clusters for several years (combined 2700 tps, ~1.3TB). Last year, we started migrating off HP-UX IA64 servers running pg 8.1.x onto Linux quadcore x86_64 Blade servers running pg 8.3.x while running on a high-grade SAN. Our average, overall query performance has improved by a very pleasant ~75%. But I'm curious why 'commit' statements (as well as certain update statements) seem to have actually degraded (1ms vs 5ms on avg, 2ms vs 14ms in the 95th percentile, etc). Any ideas? TIA. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Floating-point software assist fault?
We're seeing gobs of these via dmesg in PostgreSQL 8.3.3 on ia64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8), kernel 2.6.9-55.EL: postmaster(13144): floating-point assist fault at ip 403a9382, isr 0408 It appears to be an Itanium-specific issue with floating-point normalization, here is a document describing the issue. http://i-cluster2.inrialpes.fr/doc/misc/fpswa.txt “The Intel Itanium does not fully support IEEE denormals and requires software assistance to handle them. Without further informations, the ia64 GNU/Linux kernel triggers a fault when denormals are computed. This is the floating-point software assist fault (FPSWA) in the kernel messages. It is the user's task to clearly design his program to prevent such cases.” “To conclude, I'd like to stress the fact that the programmer has to be careful when dealing with floating-point numbers. Even with high precision, it is easy to produce denormals and get strange behaviour.” Any thoughts? TIA. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating-point software assist fault?
On Thursday 08/07/08 @ 5:43 pm MDT, I received this from Ed L. [EMAIL PROTECTED]: We're seeing gobs of these via dmesg in PostgreSQL 8.3.3 on ia64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8), kernel 2.6.9-55.EL: postmaster(13144): floating-point assist fault at ip 403a9382, isr 0408 These are coming lately exclusively from the writer process... TIA. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Floating-point software assist fault?
On Thursday 08/07/08 @ 5:46 pm MDT, I received this from Ed L. postmaster(13144): floating-point assist fault at ip 403a9382, isr 0408 These are coming lately exclusively from the writer process... Actually, the machine has been up for 45 days and dmesg doesn't have timestamps, so I'm not sure if those pids have any relation to the ones currently in use. Ed -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] large table vacuum issues
We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 autovacuums is launching a DB-wide vacuum on our 270GB database to prevent xid wrap-around, but is getting hung-up and/or bogged down for hours on a 40gb table and taking the server performance down with it, apparently due to an IO bottleneck. The autovac child process becomes completely unresponsive to SIGTERM/SIGINT; only a sigkill restart with disabling the autovac daemon gets us back to adequate performance for now. We are discussing how to partition the table (difficult due to existing foreign keys in other tables), and archiving/clearing data. Are there any other tricks to get it past this large table for the time being and still get the xid wraparound fix? TIA. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] large table vacuum issues
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: Have you tried adjusting the #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits settings to something so as to make vacuum less intrusive? might be the easiest fix. Any particular suggested changes for these parameters? Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] large table vacuum issues
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote: On Jan 4, 2008 6:38 PM, Ed L. [EMAIL PROTECTED] wrote: We need some advice on how to handle some large table autovacuum issues. One of our 8.1.2 First of all, update your 8.1 install to 8.1.10. Failing to keep up with bug fixes is negligent. who knows, you might be getting bitten by a bug that was fixed between 8.1.2 and 8.1.10 Could be. But like you said, who knows. In some environments, downtime for upgrading costs money (and more), too, sometimes even enough to make it negligent to take downtime to keep up with bug fixes (and of course, the new bugs) which may or may not be a factor at hand. While the time required to restart a DB may be neglible, there are often upstream/downstream dependencies that greatly expand the actual downtime for the customer. How much would downtime need to cost before you thought it negligent to upgrade immediately? It's a tradeoff, not well-supported by simple pronouncements, one the customer and provider are best qualified to make. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] view management
I have a question about view management... I often have need for views that reference views that reference views, and so on. When I need to make a small update to one of the views, I am faced with having to drop and recreate all dependent views even if the driving change just adds another column to the view, for example. I might have to drop and recreate many tens of views just to make a change to a single view. What a PITA. How do others manage this? TIA. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] view management
On Friday 16 November 2007 1:57 pm, Ed L. wrote: I have a question about view management... I often have need for views that reference views that reference views, and so on. When I need to make a small update to one of the views, I am faced with having to drop and recreate all dependent views even if the driving change just adds another column to the view, for example. I might have to drop and recreate many tens of views just to make a change to a single view. What a PITA. How do others manage this? And before you tell me all about DROP VIEW ... CASCADE, please note I'm not talking about that. I'm talking about the difficulties of having to recreate all views in the entire subtree of view dependencies just to change one minor aspect of an independent view. TIA... Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] view management
On Friday 16 November 2007 2:09 pm, Merlin Moncure wrote: you have to rig a build system. if you have a lot of views (which is good), and keeping them up to date is a pain, you have to automate their creation. simplest way to do that is to rig a build system around sql scripts. when you create a view the first time, save it's creation script in a .sql file and replay that when you need it. if you like to get fancy, you can always work solutions around make, etc on top of this. there are other tricks...for example you could grep object dropped by the database and replay them. That looks about as ugly as can be. Ugh. What it appears to boil down to is that views become unusable unless you are willing to invest the effort in a complex build system. The DB should handle this issue automatically. Does Oracle? There is a slightly related todo item: --- # Allow VIEW/RULE recompilation when the underlying tables change Another issue is whether underlying table changes should be reflected in the view, e.g. should SELECT * show additional columns if they are added after the view is created. --- Looks like a tricky problem. What I think is needed is some sort of column-level linkage tracking between views, rather than just view-level dependencies. For example, create table foo (id integer, msg varchar); create table bar (id integer, msg varchar); create view fooview as select id from foo; create view barview as select b.*, f.id as fooid from bar b join fooview f on b.id = f.id; When barview is created, f.id would need to be noted as depending on fooview.id rather than just noting barview depends on fooview. Then, if someone decides to recreate fooview to include foo.msg, it can be discerned that the change will not impact barview at all. That approach looks far too simple to have not been done already. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] view management
On Friday 16 November 2007 2:48 pm, Scott Marlowe wrote: On Nov 16, 2007 3:43 PM, Ed L. [EMAIL PROTECTED] wrote: That looks about as ugly as can be. Ugh. What it appears to boil down to is that views become unusable unless you are willing to invest the effort in a complex build system. The DB should handle this issue automatically. Does Oracle? Really? I find a build system to be a pretty necessary part of enterprise development. I can't imagine trying to keep track of what I've done to my db without using some kind of simple .sql scripts with all my ddl in them. And I use views and user defined functions a lot. The overall schema upgrade management system is not the difficult part. I find the difficulty comes with, for example, 5 levels of view dependencies. The view you want to update requires you to rebuild 15 others, which in turn requires you to trace back another 15 views, and so on until you reach the leafs of the tree. You don't know those dependencies when you create the first few views. Maybe you just manually discover all these dependency paths each time you decide to change a view with dependencies. That's the part I'm griping about and for which I was hoping for a better way. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] view management
Thanks, Justin. On Friday 16 November 2007 4:38 pm, Justin Pasher wrote: We have a system that has quite a few views to access some of the data (although we purposely tried to avoid views that pulled from other view due to some performance issues), but when we had all of the view interdependencies, we had a simple shell script that ran through a list of SQL files and imported them one after the other. If we every had to drop a view that cascaded to other dependent views, we would just do the DROP VIEW ... CASCADE, then run the shell script to recreate all of the views. In your situation, the time consuming part would be the initial creation of the script to get the SQL files in the correct order. After that is done, it's just a matter of proper maintenance to keep it working. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Converting non-null unique idx to pkey
On Tuesday 21 August 2007 11:40 pm, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: Are there are any known or obvious gotchas associated with transforming a unique index on a non null column into a primary key via this sql? update pg_index set indisprimary = 't' where indexrelid = my non-null unique index oid The problem with that is there won't be any pg_constraint entry, nor any pg_depend entries linking to/from the constraint... Thanks. Sounds messy enough, I'll try another route. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Converting non-null unique idx to pkey
I'm preparing a fairly large 7.4.6 DB for trigger-based replication. I'm looking for ways to minimize my impact on the existing schema data and uptime. This replication solution requires every table to have a primary key. Rather than adding a new key column and index for the pkey, it's appealing to just to reuse existing unique indices on non-null columns. Are there are any known or obvious gotchas associated with transforming a unique index on a non null column into a primary key via this sql? update pg_index set indisprimary = 't' where indexrelid = my non-null unique index oid TIA. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] history table
On Tuesday 21 August 2007 1:22 pm, Robin Helgelin wrote: Yes, this is where I'm too new to postgresql, how do I tell the database which user is logged in to the webapp? A session parameter? There will be connection pooling, but if I know how to solve the previous question I don't think it's hard to get it working with the pool. Tablelog looks pretty cool. One way to handle your user ID issue would be to initiate a user session by storing a session record (for example: id, username, starttime), then have your app pass that session ID to your updates for history. Then you could store the user ID in an update_session_id column and tablelog would help track of the history. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] history table
On Tuesday 21 August 2007 1:42 pm, Ed L. wrote: Then you could store the user ID in an update_session_id column and tablelog would help track of the history. s/user ID/session ID/g; Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Converting non-null unique idx to pkey
On Tuesday 21 August 2007 1:45 pm, Scott Marlowe wrote: If you have a large db in 7.4.6, you should do two things. 1: Update to 7.4.19 or whatever the latest flavor of 7.4 is, right now. There are a few known data eating bugs in 7.4.6. Sounds like good advice from a strictly technical viewpoint. Unfortunately, in our particular real world, there are also political, financial, and resource constraints and impacts from downtime that at times outweigh the technical merits of upgrading 'right now'. update pg_index set indisprimary = 't' where indexrelid = my non-null unique index oid I wouldn't bet on that working right. Others know the internals of the db better than me, but that looks like a foot gun. I'd still love to hear from any who know the internals well enough to say if this should work or if it's a bad idea. It appears to work in some cursory testing. TIA. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Using oid as pkey
What are the concerns with using oid as the column for a primary key declaration for use in trigger-based replication? TIA, Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] 8.1.2 select for update issue
We're seeing some unexpected behavior in one particular 64-bit Pgsql 8.1.2 running on HP-UX 11.23 and Itanium 2, built with --enable-thread-safety. We think we are seeing concurrent select-for-updates of the same rows by multiple concurrent backends, contrary to our understanding of select-for-update semantics. The rows are selected by each client process as follows: SELECT * from foo where eventprocessed = 'f' and inprogress = 'f' and eventstructure is not NULL order by key asc for update limit 25; Once the rows are selected, they are then updated within the same transaction, for example, as follows: update foo set inprogress = 't' where key in (10169339); We think the row should be locked, unselectable for update, and that the update above should remove them from selection in any subsequent select-for-updates like the one above. However, we see one backend selecting and locking a set of rows, and while it presumably has them locked and is chugging through doing updates like the one above, we see another backend select-for-update grabbing some of the same rows and performing updates. We're unable to reproduce this scenario on demand, but it does consistently happen about 1/3 of the time on this busy system whenever we turn on the second process. Any suggestions on where to hunt? TIA, Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] more select-for-update questions
I ran a simple select-for-update test on 8.1.2 and was curious as to why the semantics are what they are. Specifically, when you have multiple select-for-update-limit queries on the same rows, why are rows selected by the blocked query before knowing if some of those rows will be removed/eliminated by a preceding select-for-update-limit? Here's an example. I created this table: create table foo( id serial, done boolean not null default false, msg varchar); Then I inserted some data: select * from foo id | done | msg +--+-- 1 | f| Mon Aug 6 12:09:11 MDT 2007 2 | f| Mon Aug 6 12:09:12 MDT 2007 3 | f| Mon Aug 6 12:09:13 MDT 2007 4 | f| Mon Aug 6 12:09:14 MDT 2007 5 | f| Mon Aug 6 12:09:15 MDT 2007 (5 rows) Then in transaction A, begin; select * from foo where not done for update limit 3; id | done | msg +--+-- 1 | f| Mon Aug 6 12:09:11 MDT 2007 2 | f| Mon Aug 6 12:09:12 MDT 2007 3 | f| Mon Aug 6 12:09:13 MDT 2007 (3 rows) update foo set done = 't' where id 4; UPDATE 3 select * from foo; id | done | msg +--+-- 4 | f| Mon Aug 6 12:09:14 MDT 2007 5 | f| Mon Aug 6 12:09:15 MDT 2007 1 | t| Mon Aug 6 12:09:11 MDT 2007 2 | t| Mon Aug 6 12:09:12 MDT 2007 3 | t| Mon Aug 6 12:09:13 MDT 2007 (5 rows) Then in transaction B, before committing transaction A, begin; select * from foo where not done for update limit 3; (this blocks transaction B awaiting transaction A commit) Then, just after commit in transaction A, I see the previously-blocked query in transaction B returns: select * from foo where not done for update limit 3; id | done | msg +--+- (0 rows) It returns zero rows when I expected it to return two (id 4 and 5). If I immediately run the same query again in transaction B, I see what I expected to see in the preceding query: select * from foo where not done for update limit 3; id | done | msg +--+-- 4 | f| Mon Aug 6 12:09:14 MDT 2007 5 | f| Mon Aug 6 12:09:15 MDT 2007 (2 rows) So, B is selecting rows for update and applying the limit prior to knowing which rows will be excluded by A's updates. I know that is well-documented behavior. It just seems pretty unintuitive. I'm just wondering if there is some good reason for it. TIA. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] 8.1.2 select for update issue
On Monday 06 August 2007 1:22 pm, you wrote: Ed L. [EMAIL PROTECTED] writes: We're seeing some unexpected behavior in one particular 64-bit Pgsql 8.1.2 running on HP-UX 11.23 and Itanium 2, built with --enable-thread-safety. We think we are seeing concurrent select-for-updates of the same rows by multiple concurrent backends, contrary to our understanding of select-for-update semantics. You really ought to be using something newer than 8.1.2. Perhaps. But we have yet to find a way to make major version upgrades of 100+ GB, 100+ tps databases sufficiently inexpensive and painless in terms of SAN space, performance costs, and customer downtime on heavily loaded systems. So we put them off until there is a clear, directly compelling reason to upgrade. You do have a transaction block established around this whole process? Row locks only last as long as the current transaction ... Of course. This is grasping at straws, but I was wondering if perhaps anyone saw anything in this behavior that might suggest a threadsafe-related anomaly? TIA. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] 8.1.2 select for update issue
On Monday 06 August 2007 2:11 pm, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: On Monday 06 August 2007 1:22 pm, you wrote: You really ought to be using something newer than 8.1.2. Perhaps. But we have yet to find a way to make major version upgrades of 100+ GB, I did not suggest a major version upgrade. My mistaken assumption. We are considering an upgrade to 8.1.9. I see the number of bugfixes between 8.1.2 and 8.1.9 is lengthy. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] query log corrupted-looking entries
On Friday 08 June 2007 10:30 am, George Pavlov wrote: It is very hard to tease these apart because now that I look at it closely it is a total mess; there are multiple interruptions and interruptions inside of interruptions... The interruption can happen anywhere, including the leading log timestamp This describes my experiences as well on our high-volume clusters. Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] query log corrupted-looking entries
On Friday 01 June 2007 3:09 pm, George Pavlov wrote: On 5/29/2007 10:19 AM, Ed L. wrote: On Wednesday 23 May 2007 1:04 pm, George Pavlov wrote: FWIW, I've also been seeing this sort of query log corruption for as long as I can remember, 7.1 through 8.2, HPUX (parisc, ia64), Linux on intel, amd... Do you have any tricks for dealing with the problem from a query analyzer perspective? That is, if you use something like pgfouine do you have any quick and easy way to remove those lines (and the affected lines around them)? Or someway to fix the corrupted lines? I'd say that lately ~40% of my daily query logs are suffering from this problem making query analysis very hard. Not really. I filter by perl regular expressions, but I wouldn't say that is reliable. However, our query volumes are high enough that losing a few here and there is no big deal in analysis. I long ago realized that reliably replaying query logs was infeasible due in part to this issue. Regex filtering is problematic and error-prone (for example, how do you reliably tell the difference between log timestamp and DB data?). Perhaps others have better answers. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] query log corrupted-looking entries
On Friday 01 June 2007 3:36 pm, Tom Lane wrote: George Pavlov [EMAIL PROTECTED] writes: On 5/29/2007 10:19 AM, Ed L. wrote: FWIW, I've also been seeing this sort of query log corruption for as long as I can remember, 7.1 through 8.2, HPUX (parisc, ia64), Linux on intel, amd... What *exactly* is the logging setup you guys use, and have you tried alternatives? As best I can tell, the variant with a downstream syslogger process ought to be OK for messages up to PIPEBUF bytes long on most standard Unixen. Of course that's no help if you've got lots of very verbose queries, but the examples posted so far didn't seem to indicate that. On older versions, we use stderr piped to Apache's rotatelogs. On newer versions with built-in logging, we use it. We do have considerably verbose queries (sometimes a page or so). No idea what PIPEBUF equals. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] query log corrupted-looking entries
On Friday 01 June 2007 3:36 pm, Tom Lane wrote: What *exactly* is the logging setup you guys use, and have you tried alternatives? redirect_stderr = on# Enable capturing of stderr into log log_directory = '/users/.../logs' # Directory where log files are written log_filename = 'server_log.%a.%H:00-%H:59' # Log file name pattern. log_truncate_on_rotation = on # If on, any existing log file of the same log_rotation_age = 60min# Automatic rotation of logfiles will log_min_messages = notice # Values, in order of decreasing detail: log_connections = on log_disconnections = on log_duration = on log_line_prefix = '%m [%p] %u %r %d ' # Special values: log_statement = 'all' # none, ddl, mod, all I have not tried alternatives. The status quo is adequate. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] query log corrupted-looking entries
On Wednesday 23 May 2007 1:04 pm, George Pavlov wrote: Hoping to resurrect this thread. I am seeing more and more of this as the database gets more usage and it really messes up query log analysis. A quick summary: When I posted this was getting corrupted query log entries. I still am. They look like this (the problem line + one line before and after): 2007-05-15 14:24:52 PDT [3859]: [968-1] LOG: duration: 0.071 ms 2007-05-15 14:24:52 PDT [3859]: [969-1] LOG: statement2007-05-15 14:24:52 PDT [3885]: [208-1] LOG: duration: 11.800 ms FWIW, I've also been seeing this sort of query log corruption for as long as I can remember, 7.1 through 8.2, HPUX (parisc, ia64), Linux on intel, amd... Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] HP/Pgsql/DBD::Pg issue
On Thursday 26 April 2007 9:42 am, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: After a reboot (and usually after an OS patch) on our HP-UX 11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to work. Instead, they give the standard message you get when the DB cluster is not running. Try ktrace'ing the client to see what it's doing at the kernel-call level. (I think HPUX's equivalent is just called trace btw.) Attached is a small tar.gz file containing a short perl DBI connection script that repeatedly demonstrates this problem. There are also two log files containing tusc output (an HP syscall trace utility), one for the 32-bit run (which works) and another for the 64-bit run (which fails). I haven't made much sense of it yet, so any help deciphering is appreciated. TIA. Ed connfail.tar.gz Description: application/tgz ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] HP/Pgsql/DBD::Pg issue
On Tuesday 01 May 2007 2:23 pm, Tom Lane wrote: Well, it's going wrong here: socket(AF_INET, SOCK_STREAM, 0) .. = 4 setsockopt(4, 0x6, TCP_NODELAY, 0x9fffe210, 4) ... = 0 fcntl(4, F_SETFL, 65536) . = 0 fcntl(4, F_SETFD, 1) . = 0 connect(4, 0x60416ea0, 16) ... = 0 getsockopt(4, SOL_SOCKET, SO_ERROR, 0x9fffe32c, 0x9fffe338) = 0 close(4) . = 0 The close() indicates we're into the failure path, so evidently the getsockopt returned a failure indication (though it's hard to tell what --- strerror() isn't providing anything useful). What strikes me as odd about this is that the connect() really should have returned EINPROGRESS or some other failure code, because we're doing it in nonblock mode. A zero return implies that the connection is already made, which it shouldn't be if you're connecting to some other machine (if this is a local connection then maybe it's sane, but I don't see that here when testing loopback TCP connections). So I wonder if connect() is blowing it here and claiming the connection is ready when it's not quite yet. Another possibility is that getsockopt() is returning bad data, which smells a bit more like the sort of thing that might go wrong in 64 vs 32 bit mode. It is indeed a local connection using PGHOST=`hostname`. That name maps to one of the external NIC IPs, not to the normal 127.0.0.1 loopback address. For context, I've seen this a number of times over the past couple years, from pgsql 7.3.x to 8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums, always via a local connection using `hostname` mapping to an external NIC. What it is about the reboots that triggers this remains a mystery. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] HP/Pgsql/DBD::Pg issue
On Tuesday 01 May 2007 2:46 pm, Ed L. wrote: It is indeed a local connection using PGHOST=`hostname`. That name maps to one of the external NIC IPs, not to the normal 127.0.0.1 loopback address. For context, I've seen this a number of times over the past couple years, from pgsql 7.3.x to 8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums, always via a local connection using `hostname` mapping to an external NIC. What it is about the reboots that triggers this remains a mystery. Not to create a red herring, I should add it also fails for PGHOST=localhost/127... Only relinking/reinstalling with 32-bit perl seems to fix it. I will see if I can tweak fe-connect.c. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] HP/Pgsql/DBD::Pg issue
After a reboot (and usually after an OS patch) on our HP-UX 11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to work. Instead, they give the standard message you get when the DB cluster is not running. But we *know* it is running and all access paths are working. We have found a workaround by switching from 64-bit perl to 32-bit perl, build a 32-bit pgsql, and rebuild the perl DBD module using 32-bit perl and linking with the 32-bit pgsql. But the fact we're having to do that is a problem for us. I don't understand this problem and am at a loss as to where to look. Any ideas? TIA. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] HP/Pgsql/DBD::Pg issue
On Thursday 26 April 2007 8:50 am, Ed L. wrote: After a reboot (and usually after an OS patch) on our HP-UX 11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to work. Instead, they give the standard message you get when the DB cluster is not running. But we *know* it is running and all access paths are working. We have found a workaround by switching from 64-bit perl to 32-bit perl, build a 32-bit pgsql, and rebuild the perl DBD module using 32-bit perl and linking with the 32-bit pgsql. But the fact we're having to do that is a problem for us. I don't understand this problem and am at a loss as to where to look. Any ideas? I should add that it is only these client apps that are running on the DB server itself. DBD apps connecting remotely don't have any problems. TIA. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] dropping role w/dependent objects
This is pgsql 8.2.3: % psql -c drop role mygroup ERROR: role mygroup cannot be dropped because some objects depend on it DETAIL: 227 objects in this database How do I identify what these dependent objects are? I've removed all of the users from this group, turned up server logging to debug5, and searched docs to no avail. TIA. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 4:35 pm, Alvaro Herrera wrote: Note that you can give the objects owned by that role to someone else with REASSIGN OWNED, and drop the objects with DROP OWNED (note that they act differently regarding grants; see the docs) Yes, but how do identify what they are so that I know if I want to DROP OWNED them? TIA. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 4:39 pm, Ed L. wrote: On Wednesday April 4 2007 4:35 pm, Alvaro Herrera wrote: Note that you can give the objects owned by that role to someone else with REASSIGN OWNED, and drop the objects with DROP OWNED (note that they act differently regarding grants; see the docs) Yes, but how do identify what they are so that I know if I want to DROP OWNED them? The REASSIGN OWNED appears to be insufficient: % psql -c reassign owned by mygroup to mydba REASSIGN OWNED % psql -c drop group mygroup ERROR: role mygroup cannot be dropped because some objects depend on it DETAIL: 225 objects in this database Thanks, Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 4:41 pm, Ed L. wrote: On Wednesday April 4 2007 4:39 pm, Ed L. wrote: On Wednesday April 4 2007 4:35 pm, Alvaro Herrera wrote: Note that you can give the objects owned by that role to someone else with REASSIGN OWNED, and drop the objects with DROP OWNED (note that they act differently regarding grants; see the docs) Yes, but how do identify what they are so that I know if I want to DROP OWNED them? The REASSIGN OWNED appears to be insufficient: % psql -c reassign owned by mygroup to mydba REASSIGN OWNED % psql -c drop group mygroup ERROR: role mygroup cannot be dropped because some objects depend on it DETAIL: 225 objects in this database I did a before and after dump to compare. The objects it is complaining about are GRANTs to that group. Hmm... seems like you shouldn't have to drop every grant for a group you're dropping. Thanks, Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 4:48 pm, Alvaro Herrera wrote: Yes, but how do identify what they are so that I know if I want to DROP OWNED them? There's no way AFAICT, short of peeking the catalogs (or information_schema). Try pg_shdepend. I guess if the bug were fixed, it'd be a non-issue. Thanks. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 5:02 pm, Alvaro Herrera wrote: I guess if the bug were fixed, it'd be a non-issue. Sure, please submit a patch. It should not be too difficult. Perhaps this could be added to the TODO list? I won't get to it anytime soon. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] dropping role w/dependent objects
On Wednesday April 4 2007 5:37 pm, Bruce Momjian wrote: Perhaps this could be added to the TODO list? I won't get to it anytime soon. Yes. What should the TODO text be? See if the attached patch is acceptable. If not, perhaps the TODO text should be: Enable end user to identify dependent objects when the following error is encountered: ERROR: role mygroup cannot be dropped because some objects depend on it DETAIL: 227 objects in this database Index: ./src/backend/catalog/pg_shdepend.c === RCS file: /projects/cvsroot/pgsql/src/backend/catalog/pg_shdepend.c,v retrieving revision 1.17 diff -C1 -r1.17 pg_shdepend.c *** ./src/backend/catalog/pg_shdepend.c 3 Mar 2007 19:32:54 - 1.17 --- ./src/backend/catalog/pg_shdepend.c 5 Apr 2007 00:05:56 - *** *** 484,488 * We try to limit the number of reported dependencies to something sane, ! * both for the user's sake and to avoid blowing out memory. */ ! #define MAX_REPORTED_DEPS 100 --- 484,497 * We try to limit the number of reported dependencies to something sane, ! * both for the user's sake and to avoid blowing out memory. But since ! * this is the only way for an end user to easily identify the dependent ! * objects, make the limit pretty big. Generously assuming each object ! * description is 64 chars long, and assuming we add some commentary of ! * up to 15 chars in storeObjectDescription(), that's ~80 chars per ! * object. If we allow 2000, that's 160Kb, which is reasonable. If the ! * installer gets wild and uses 128 character names, that's still only ! * 320Kb. These sorts of high numbers of dependencies are reached quite ! * easily when a sizeable schema of hundreds of tables has specific grants ! * on each relation. */ ! #define MAX_REPORTED_DEPS 2000 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] daylight savings patches needed?
We have a 7.4.6 cluster which has been running on an HP B.11.00 box for quite sometime. The IT group applied daylight savings patches to the OS, but the cluster is still showing the incorrect timezone: $ psql -c select now() now --- 2007-03-12 13:46:11.369583-05 $ date Mon Mar 12 14:46:18 EDT 2007 It appears that newly initialized 7.4.6 clusters get the proper timezone. I didn't see anything in the FAQ about restarting. Does this mean that we need to restart these clusters in order to get the timezone updates from the OS? Are they cached in the postmaster? TIA. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] daylight savings patches needed?
On Monday March 12 2007 1:07 pm, Ed L. wrote: Does this mean that we need to restart these clusters in order to get the timezone updates from the OS? Are they cached in the postmaster? Nevermind. I just found it via googling. Would I be correct in understanding that every pre-8.0 cluster must be restarted in order for the OS changes to take affect?!? Perhaps it should be a part of the FAQ answer at http://www.postgresql.org/docs/faqs.FAQ.html#item1.14 Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] daylight savings patches needed?
On Monday March 12 2007 4:08 pm, Martijn van Oosterhout wrote: On Mon, Mar 12, 2007 at 01:13:42PM -0600, Ed L. wrote: Would I be correct in understanding that every pre-8.0 cluster must be restarted in order for the OS changes to take affect?!? Possibly, I imagine many C libraries would cache the timezone data over a fork and might not motice the changes... That was a nasty little surprise. Thanks, Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] vacuum error
On Tuesday March 6 2007 11:52 pm, Peter Eisentraut wrote: Ed L. wrote: Right. I'm asking if the fix for this problem is in the new 8.1.8 software, or in the new DB structure resulting from the initdb, or perhaps both. There is no new DB structure in 8.1.8, which is why you can update without initdb. Consult the release notes for details. Perhaps my question was not clear enough. Let me rephrase: Does the fix for this problem comes from a *fresh* DB structure resulting from the initdb, or from a software fix in 8.1.8, or both? The answer makes a big difference with a 200gb database. If it's in the software alone, we can simply restart. If it's in the DB structure, we have to migrate 200gb of data from one PGDATA to another. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in pg_statistic
On Wednesday February 7 2007 9:01 am, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. Seems odd to me too ... but pg_statistic doesn't contain any irreplaceable data. See if you can TRUNCATE it. If so, do an ANALYZE to repopulate it. I don't recall if 7.4 has any special hoops you'd have to jump through to truncate a system catalog ... Scheduled downtime finally arrived last night, and I tried unsuccessfully to truncate this table: postgres -D $PGDATA -O -o standalone_log mydb truncate pg_statistic; ^D pg_ctl start psql -d mydb -c vacuum analyze The truncate showed no errors. The vacuum analyze showed the same error in block 110 of the pg_statistic table. Any ideas what I'm missing here? Is there another acceptable way to truncate that table, perhaps rm file; touch file? Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] vacuum error
On Wednesday March 7 2007 3:13 am, Martijn van Oosterhout wrote: On Wed, Mar 07, 2007 at 02:29:08AM -0700, Ed L. wrote: Perhaps my question was not clear enough. Let me rephrase: Does the fix for this problem comes from a *fresh* DB structure resulting from the initdb, or from a software fix in 8.1.8, or both? The answer makes a big difference with a 200gb database. If it's in the software alone, we can simply restart. If it's in the DB structure, we have to migrate 200gb of data from one PGDATA to another. In your case you could also get running by creating the missing file, but whether that's appropriate depends on the cause. I'd check the release notes to see if any related bugs have been fixed since then. How would I go about correctly creating the missing file? That sounds appealing, as if it were something I could do without taking downtime. Is it? In reviewing the release notes between 8.1.2 and 8.1.8, there are a number of vacuum fixes along with many others. It is unclear if any of them are related. Given this error is occurring while doing a VACUUM FREEZE of the template0 database, I wonder if that creates any options for me? Frankly, I'd be happy to simply drop the template0 DB if I could, since I'm not aware that we ever need it for anything. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] vacuum error
I am seeing the following error in pgsql 8.1.2: 2007-03-05 10:00:51.106 PST [9834]DEBUG: vacuuming pg_toast.pg_toast_1260 2007-03-05 10:00:51.106 PST [9834]DEBUG: index pg_toast_1260_index now contains 0 row versions in 1 pages 2007-03-05 10:00:51.106 PST [9834]DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. 2007-03-05 10:00:51.106 PST [9834]DEBUG: pg_toast_1260: found 0 removable, 0 nonremovable row versions in 0 pages 2007-03-05 10:00:51.106 PST [9834]DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. 2007-03-05 10:00:51.106 PST [9834]DEBUG: vacuuming pg_catalog.pg_statistic 2007-03-05 10:00:51.107 PST [9834]ERROR: could not access status of transaction 3229475082 2007-03-05 10:00:51.107 PST [9834]DETAIL: could not open file pg_clog/0C07: No such file or directory What does it mean, and what should I do about it? TIA. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] vacuum error
On Tuesday March 6 2007 12:20 pm, Peter Eisentraut wrote: Ed L. wrote: I am seeing the following error in pgsql 8.1.2: ERROR: could not access status of transaction 3229475082 DETAIL: could not open file pg_clog/0C07: No such file or directory What does it mean, and what should I do about it? 1. Read this thread: http://archives.postgresql.org/pgsql-general/2007-02/msg00820. php 2. Upgrade to the latest 8.1.* release. 3. If that doesn't help, check your system for faulty hardware, in particular for bad RAM. This is a 200gb DB with ~300 transactions/second serving 5 busy facilities, so downtime comes at a premium. We have some maintenance downtime planned for 2 weeks from now. I'm trying to understand if this can wait that long. It appears the only failure occurs during an autovacuum-initiated VACUUM FREEZE on template0 when it hits the pg_statistics table. However, that abort appears to be causing autovacuum to skip all its other duties as it endlessly restarts and fails again. Do I care if template0 gets a VACUUM FREEZE? Assuming not, is there a simple way to make autovacuum skip over template0 so it can tend to the important data in the other databases? Is restarting with 8.1.8 a known solution for this problem? Or is an initdb required to fix it? If initdb is required, we might as well move to the latest stable 8.2 version. I understand my options to minimize downtime to be limited to async replication. Other ideas? BTW, the RAM looks good. TIA. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] vacuum error
On Tuesday March 6 2007 3:53 pm, Joshua D. Drake wrote: Is restarting with 8.1.8 a known solution for this problem? Or is an initdb required to fix it? You can update to 8.1.8 (if you are running 8.1.x) without an initdb. Right. I'm asking if the fix for this problem is in the new 8.1.8 software, or in the new DB structure resulting from the initdb, or perhaps both. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] db stats vs table stats
I've been periodically collecting the stats stored in pg_statio_all_tables and pg_stat_database for ~30 different clusters, and have noticed a curiosity. I would have thought that for a given period, the change in pg_stat_database.blks_read would be = the sum of the changes in pg_statio_user_tables.heap_blks_read + pg_statio_user_tables.idx_blks_read + pg_statio_user_tables.toast_blks_read + pg_statio_user_tables.tidx_blks_read. In short, the total would be = heap + idx + toast + idx for user tables. It does not appear that way. The table-level IO stats appear to be typically 1-2 orders of magnitude larger than the db-level stats. Can anyone explain that? TIA. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] db stats vs table stats
Oops, typo: I reversed the inequality. I've corrected it below. On Friday February 23 2007 2:02 pm, Ed L. wrote: I've been periodically collecting the stats stored in pg_statio_all_tables and pg_stat_database for ~30 different clusters, and have noticed a curiosity. I would have thought that for a given period, the change in pg_stat_database.blks_read would be = the sum of the changes in pg_statio_user_tables.heap_blks_read + pg_statio_user_tables.idx_blks_read + pg_statio_user_tables.toast_blks_read + pg_statio_user_tables.tidx_blks_read. In short, the total would be = heap + idx + toast + idx for user tables. It does not appear that way. The table-level IO stats appear to be typically 1-2 orders of magnitude larger than the db-level stats. Can anyone explain that? TIA. Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] db stats vs table stats
On Friday February 23 2007 3:06 pm, Ed L. wrote: I've been periodically collecting the stats stored in pg_statio_all_tables and pg_stat_database for ~30 different clusters, and have noticed a curiosity... The table-level IO stats appear to be typically 1-2 orders of magnitude larger than the db-level stats. Can anyone explain that? Here's an example of how I'm calculating the deltas. Perhaps someone can spot an error or mistaken assumption. In this case, the deltas are not orders of magnitude out of sync with each other, but they grew from about 3% out of sync to 45% out of sync in ~35 minutes on a DB with 500 transactions/ second. drop table s; create table s as select now(), blks_read as db_blks_read, sum(case when heap_blks_read ISNULL then 0 else heap_blks_read end + case when idx_blks_read ISNULL then 0 else idx_blks_read end + case when toast_blks_read ISNULL then 0 else toast_blks_read end + case when tidx_blks_read ISNULL then 0 else tidx_blks_read end) as table_blks_read from pg_stat_database sd, pg_database d, pg_class c, pg_statio_all_tables st where sd.datname = d.datname and d.datname = current_database() and c.oid = st.relid group by blks_read; create or replace view delta_view as select now() - s.now as delta, blks_read - s.db_blks_read as db_blks_read_delta, sum(case when heap_blks_read ISNULL then 0 else heap_blks_read end + case when idx_blks_read ISNULL then 0 else idx_blks_read end + case when toast_blks_read ISNULL then 0 else toast_blks_read end + case when tidx_blks_read ISNULL then 0 else tidx_blks_read end) - s.table_blks_read as table_blks_read_delta from pg_stat_database sd, pg_database d, pg_class c, pg_statio_all_tables st, s where sd.datname = d.datname and d.datname = current_database() and c.oid = st.relid group by blks_read, s.now, db_blks_read, table_blks_read; select * from delta_view; delta | db_blks_read_delta | table_blks_read_delta -++--- 00:32:51.007703 | 384243 |556212 (1 row) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] invalid page header in pg_statistic
How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. TIA. Ed *** * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 16408.backup-20070207-01:11:58PST * Options used: -y -f -R 110 110 * * Dump created on: Wed Feb 7 01:56:04 2007 *** Block 110 Header - Block Offset: 0x000dc000 Offsets: Lower 160 (0x00a0) Block: Size 8192 Version1Upper8192 (0x2000) LSN: logid178 recoff 0xf577b4e8 Special 8192 (0x2000) Items: 35 Free Space: 8032 Length (including item array): 164 : 00b2 f577b4e8 0028 00a02000 .w.(.. . 0010: 20002001 3da0 3ae0 3730 . .=...:...70.. 0020: 34d0 3210 2e60 2c00 4...2`..,... 0030: 2940 2590 2330 2070 )@..%...#0.. p.. 0040: 1cc0 1a60 17a0 13f0 .`.. 0050: 1190 0ed0 0b20 08c0 . .. 0060: 0600 0250 1bc0 19c0 .P.. 0070: 1890 1740 15f0 14a0 [EMAIL PROTECTED] 0080: 12c0 3da0 3ba0 3a70 =...;...:p.. 0090: 3920 37d0 3680 34a0 9 ..7...6...4... 00a0: Data -- Item 1 -- Length:0 Offset: 7888 (0x1ed0) Flags: 0x00 Item 2 -- Length:0 Offset: 7536 (0x1d70) Flags: 0x00 Item 3 -- Length:0 Offset: 7064 (0x1b98) Flags: 0x00 Item 4 -- Length:0 Offset: 6760 (0x1a68) Flags: 0x00 Item 5 -- Length:0 Offset: 6408 (0x1908) Flags: 0x00 Item 6 -- Length:0 Offset: 5936 (0x1730) Flags: 0x00 Item 7 -- Length:0 Offset: 5632 (0x1600) Flags: 0x00 Item 8 -- Length:0 Offset: 5280 (0x14a0) Flags: 0x00 Item 9 -- Length:0 Offset: 4808 (0x12c8) Flags: 0x00 Item 10 -- Length:0 Offset: 4504 (0x1198) Flags: 0x00 Item 11 -- Length:0 Offset: 4152 (0x1038) Flags: 0x00 Item 12 -- Length:0 Offset: 3680 (0x0e60) Flags: 0x00 Item 13 -- Length:0 Offset: 3376 (0x0d30) Flags: 0x00 Item 14 -- Length:0 Offset: 3024 (0x0bd0) Flags: 0x00 Item 15 -- Length:0 Offset: 2552 (0x09f8) Flags: 0x00 Item 16 -- Length:0 Offset: 2248 (0x08c8) Flags: 0x00 Item 17 -- Length:0 Offset: 1896 (0x0768) Flags: 0x00 Item 18 -- Length:0 Offset: 1424 (0x0590) Flags: 0x00 Item 19 -- Length:0 Offset: 1120 (0x0460) Flags: 0x00 Item 20 -- Length:0 Offset: 768 (0x0300) Flags: 0x00 Item 21 -- Length:0 Offset: 296 (0x0128) Flags: 0x00 Item 22 -- Length:0 Offset: 3552 (0x0de0) Flags: 0x00 Item 23 -- Length:0 Offset: 3296 (0x0ce0) Flags: 0x00 Item 24 -- Length:0 Offset: 3144 (0x0c48) Flags: 0x00 Item 25 -- Length:0 Offset: 2976 (0x0ba0) Flags: 0x00 Item 26 -- Length:0 Offset: 2808 (0x0af8) Flags: 0x00 Item 27 -- Length:0 Offset: 2640 (0x0a50) Flags: 0x00 Item 28 -- Length:0 Offset: 2400 (0x0960) Flags: 0x00 Item 29 -- Length:0 Offset: 7888 (0x1ed0) Flags: 0x00 Item 30 -- Length:0 Offset: 7632 (0x1dd0) Flags: 0x00 Item 31 -- Length:0 Offset: 7480 (0x1d38) Flags: 0x00 Item 32 -- Length:0 Offset: 7312 (0x1c90) Flags: 0x00 Item 33 -- Length:0 Offset: 7144 (0x1be8) Flags: 0x00 Item 34 -- Length:0 Offset: 6976 (0x1b40) Flags: 0x00 Item 35 -- Length:0 Offset: 6736 (0x1a50) Flags: 0x00 *** End of Requested Range Encountered. Last Block Read: 110 *** ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in pg_statistic
On Wednesday February 7 2007 9:01 am, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. Seems odd to me too ... but pg_statistic doesn't contain any irreplaceable data. See if you can TRUNCATE it. If so, do an ANALYZE to repopulate it. I don't recall if 7.4 has any special hoops you'd have to jump through to truncate a system catalog ... $ psql -c truncate table pg_statistic ERROR: permission denied: pg_statistic is a system catalog I suppose this means single-user mode is required? Any other trick to do this during uptime? TIA. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] daylight savings patches needed?
From the FAQ: 1.14) Will PostgreSQL handle recent daylight saving time changes in various countries? PostgreSQL versions prior to 8.0 use the operating system's timezone database for daylight saving information. All current versions of PostgreSQL 8.0 and later contain up-to-date timezone information. In this context, assuming we have applied the relevant OS patches (RHEL, Debian, HPUX 11.11, 11.23), does the phrase contain up-to-date timezone information mean that all 7.[234].x and 8.x installations are prepared to properly handle the March 1, 2007 DST changes in the US? If not, what is needed? TIA. Ed ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Index bloat of 4x
We have a large number (50+) of pre-8.2 clusters. How can I best/most easily identify those indices most bloated and in need of reindex/rebuilding? Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Index bloat of 4x
On Friday January 19 2007 2:11 am, Csaba Nagy wrote: I afraid I don't see how any of the answers I saw discussed fit a 24x7 operation. Reindex, drop index, vacuum full, ... they all block production queries of one sort or another for significant periods of time (minutes) on large (multi/tens of GB) tables, and thus are infeasible for true 24x7 operations.[snip] This is not completely true, as of 8.2 there is an online index build, and if that could be used in a background thread to rebuild the index and replace the bloated one once it's finished, that would be a non-blocking operation which could be done in 24x7 situations. Online index creation definitely helps us toward 24x7. But wouldn't we still have to drop the old index, thus blocking production queries? Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Index bloat of 4x
On Thursday January 18 2007 6:07 am, Bill Moran wrote: Right. It doesn't _look_ that way from the graph, but that's because I only graph total DB size. I expect if I graphed data and index size separately, it would be evident. pg_total_relation_size() might give you what you want there. At this point, I'm going to assume that my question of, Is this 4x bloat strange enough to warrant further investigation is no. It seems like this amount of bloat isn't terribly unusual, and that the people working on improving this sort of thing already have enough examples of it. I afraid I don't see how any of the answers I saw discussed fit a 24x7 operation. Reindex, drop index, vacuum full, ... they all block production queries of one sort or another for significant periods of time (minutes) on large (multi/tens of GB) tables, and thus are infeasible for true 24x7 operations. What it seems we really need is something to remove the bloat without blocking production DML queries, while under significant query load, with very large tables. This bloat issue is by far our biggest headache on the DB side. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] lock query
I wonder if anyone might help me generate a SQL query that peers into pg_locks, pg_stat_activity, etc and tells in plain language exactly *who* each backend is blocked *on* while awaiting lock(s). Here's what I'm looking at now: SELECT date_trunc('second', now()) as now, a.client_addr as ip, l.pid, l.locktype, l.mode, l.granted, r.relname, l.page, l.tuple, l.transactionid, a.query_start, a.current_query as sql FROM pg_locks l LEFT OUTER JOIN pg_class r ON r.oid = l.relation LEFT OUTER JOIN pg_stat_activity a ON l.pid = a.procpid ORDER BY a.query_start ASC; For busy systems with hundreds of backends and hundreds of queries per second, I find the output of this query very difficult to quickly who is holding the key lock(s) on which blocked backends wait. What would be really helpful is a query that generated output along the lines of: Backend pid 123 is blocked awaiting pid 456 lock on 'sessions' relation. Perhaps this function already exists? If not, what is needed to get there? TIA, Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] autovac hung/blocked
On Thursday November 16 2006 3:33 am, Richard Huxton wrote: Ed L. wrote: One idea would be to partition the table some how such that the chunks getting vacuumed are much smaller and thus not such an impact. On the app side, I suppose we could break the table into multiple tables on some dimension (time) to make the vacuum impacts smaller. You're running on bigger datasets than I'm used to, but that would be my approach. Did you notice the constraint partitioning introduced in 8.1? http://www.postgresql.org/docs/8.1/static/ddl-partitioning.htm l#DDL-PARTITIONING-CONSTRAINT-EXCLUSION Thanks for the tip. We have avoided use of inheritance in order to stay closer to the beaten path, but the partitioning implementation ideas are useful. Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] autovac hung/blocked
On Wednesday November 15 2006 6:30 am, Alvaro Herrera wrote: The table in question appears to be the pathological case for vacuum: very large with lots of frequent UPDATEs. It's essentially a log table. A big log table where the log entries are being updated? Certainly sounds like a recipe for vacuum headaches. I'm curious to know how others are dealing with this problem, and how pgsql might support this issue. In our case, we have a 1.8GB OS cache, a 30GB DB cache, serving around 200 transactions/second from a 110GB DB, and this problematic table is 15GB on disk. So when it is vacuumed, I suspect it essentially flushes the OS cache and half the DB cache, severely impacting performance in an already cpu-bottlenecked machine. I have attempted to adjusted autovac to spread out its I/O impact, but then it takes so long to run that other smaller frequently-updated tables are not vacuumed/analyzed in the meantime and performance starts to suffer. Suppose there simply are no off-hours periods when you can vacuum a very large table with many frequent updates. (There is never a good time to flush the caches.) How do you manage such a table in a 24x7 environment? One idea would be to partition the table some how such that the chunks getting vacuumed are much smaller and thus not such an impact. On the app side, I suppose we could break the table into multiple tables on some dimension (time) to make the vacuum impacts smaller. But a pgsql solution in the future would be nice. I don't know the pgsql code well, but what if the freespace map was divided into smaller sized sectors such that individual sectors could be vacuumed without having to hit the entire table? Or even simply breaking up the 15GB vacuum job into 1GB chunks with some spacing between would help. Of course, it'd be important to keep the smaller tables vacuumed/analyzed as needed in the in-between time. I don't know what the best answer is here, just groping for ideas. TIA. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Transaction id wraparound problem
On Wednesday November 15 2006 4:18 pm, Morris Goldstein wrote: If I'm vacuuming every day (or two), and not running anywhere near 1 billion transactions a day, why am I running into transaction id wraparound problems? Is this just complaining that template0 and template1 haven't been vacuumed in the over 2 billion transactions encountered by testdb? (I never touch template0 and template1.) If that's what's going on, I take it that I have no risk of data loss? And is there some reason to vacuum these databases, (other than to avoid the scary messages)? I don't believe template0 ever gets vacuumed since it doesn't allow connections. Template1 can take a vacuum command, and I think that might be where your message is coming from. If you vacuum it, do your messages go away? Your testdb looks like it was vacuumed about 10,000 transactions ago. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] autovac hung/blocked
I have an 8.1.2 autovac which appears to be hanging/blocking every few days or so, but we're don't understand what's causing it. I wasn't able to catch a backtrace before we killed it. I do not see autovac locks in the pg_locks view. Will running 8.1.5 buy me anything in terms of being able to see what it has locked and what locks it might be awaiting? Or maybe in terms of bugs that might explain it? TIA. Ed ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] autovac hung/blocked
On Tuesday November 14 2006 12:49 pm, Jim C. Nasby wrote: On Tue, Nov 14, 2006 at 11:20:30AM -0700, Ed L. wrote: I have an 8.1.2 autovac which appears to be hanging/blocking every few days or so, but we're don't understand what's causing it. I wasn't able to catch a backtrace before we killed it. I do not see autovac locks in the pg_locks view. Will running 8.1.5 buy me anything in terms of being able to see what it has locked and what locks it might be awaiting? Or maybe in terms of bugs that might explain it? If there's no locks, why do you think it's blocked? I had wondered if all autovac locks were in the pg_locks view, but I now see them. A gdb backtrace a few days ago had autovac waiting on a semaphore, and I assumed that was waiting on a lock. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] autovac hung/blocked
On Tuesday November 14 2006 12:56 pm, Jim C. Nasby wrote: You don't have the vacuum cost delay settings set unreasonably high, do you? On Tuesday November 14 2006 12:56 pm, you wrote: You don't have the vacuum cost delay settings set unreasonably high, do you? I'm not sure. Here's what we're running: #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits autovacuum = on # enable autovacuum subprocess? #autovacuum_naptime = 60# time between autovacuum runs, in secs autovacuum_naptime = 600# changed by CW 9/11/06 to minimize interference with application autovacuum_vacuum_threshold = 1000 # min # of tuple updates before # vacuum autovacuum_analyze_threshold = 500 # min # of tuple updates before autovacuum_vacuum_scale_factor = 0.4# fraction of rel size before # vacuum autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before autovacuum_vacuum_cost_delay = 500 # default vacuum cost delay for # vacuum_cost_delay autovacuum_vacuum_cost_limit = 200 # default vacuum cost limit for # vacuum_cost_limit The table on which it seems to be struggling is updated many times per second and has 7.2M rows over 15GB total. I'm trying to figure out how often it would be napping on that...? Maybe we're just seeing a very conservative vacuum on a large table...? Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] autovac state persistence
Does autovac maintain its state/counters across restats as to who need to be vacuumed/analyzed? Or does killing autovac cause it to reset the counters for the vacuum/analyze threshholds? TIA. Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] autovac hung/blocked
On Tuesday November 14 2006 1:02 pm, Ed L. wrote: On Tuesday November 14 2006 12:56 pm, Jim C. Nasby wrote: You don't have the vacuum cost delay settings set unreasonably high, do you? On Tuesday November 14 2006 12:56 pm, you wrote: You don't have the vacuum cost delay settings set unreasonably high, do you? I'm not sure. Here's what we're running: Well, I think we clearly have an HPUX CPU bottleneck (long pri queue, high cpu utilization, high user cpu %, lots of processes blocked on pri). It seems to get worst and slow all queries down across the board when autovac tries to vacuum a 15GB table. I'm guessing this is flushing the OS/DB caches, exacerbating the CPU bottleneck. I'm also not sure what to do about it beyond the customer buying some politically/financially expensive CPUs. The table in question appears to be the pathological case for vacuum: very large with lots of frequent UPDATEs. It's essentially a log table. So the other options seem to be figure out how to partition the table so as to minimize size of the data needing vacuuming, or to redesign if possible to replace UPDATEs with INSERTs. Other ideas? Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] autovac state persistence
On Tuesday November 14 2006 11:51 pm, Matthew T. O'Connor wrote: Ed L. wrote: Does autovac maintain its state/counters across restats as to who need to be vacuumed/analyzed? Or does killing autovac cause it to reset the counters for the vacuum/analyze threshholds? Depends on the version. The contrib autovacuum does not maintain state through a restart. The integrated version (8.0 and 8.1) can maintain state as long as you don't have the stat system configured to reset on restart. I'm talking about 8.1.2, thanks. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] 8.1.2 locking issues
I have a few questions on pgsql locking terms and such... I created the following view to make viewing the locks a little easier: -- CREATE OR REPLACE VIEW locksview AS -- SELECT l.*, r.*, a.*, now() - a.query_start as query_age, --substring(replace(current_query, '\n', ' '), 1, 30) as sql30, --case when granted then 'granted' else 'WAITING' end as status, --case l.mode when 'AccessShareLock' then 1 --when 'RowShareLock' then 2 --when 'RowExclusiveLock' then 3 --when 'ShareUpdateExclusiveLock' then 4 --when 'ShareLock' then 5 --when 'ShareRowExclusiveLock' then 6 --when 'ExclusiveLock' then 7 --when 'AccessExclusiveLock' then 8 --else 100 end as exclusivity -- FROM pg_locks l LEFT OUTER JOIN pg_class r ON r.oid = l.relation -- LEFT OUTER JOIN pg_stat_activity a ON l.pid = a.procpid; I then started two transactions that try to update the same row, and then ran the following query on the view above: SELECT now(), client_addr, pid, query_age, status, locktype, mode, relname, page, tuple as tup, current_query FROM locksview WHERE now() - query_start '3 minute'::INTERVAL ORDER BY query_age DESC, exclusivity DESC; now | client_addr | pid |query_age| status | locktype| mode | relname | page | tup | current_query ---+-+--+-+-+---+--+-+--+-+ 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6484 | 00:01:25.830264 | granted | transactionid | ExclusiveLock| | | | IDLE in transaction 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6484 | 00:01:25.830264 | granted | relation | RowExclusiveLock | foo | | | IDLE in transaction 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6484 | 00:01:25.830264 | granted | relation | AccessShareLock | foo | | | IDLE in transaction 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6508 | 00:01:18.862623 | granted | tuple | ExclusiveLock| foo |0 | 7 | update foo set msg = 'maybe' where id = 3; 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6508 | 00:01:18.862623 | granted | transactionid | ExclusiveLock| | | | update foo set msg = 'maybe' where id = 3; 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6508 | 00:01:18.862623 | WAITING | transactionid | ShareLock| | | | update foo set msg = 'maybe' where id = 3; 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6508 | 00:01:18.862623 | granted | relation | RowExclusiveLock | foo | | | update foo set msg = 'maybe' where id = 3; 2006-11-09 20:37:25.680662-08 | 10.0.1.82 | 6508 | 00:01:18.862623 | granted | relation | AccessShareLock | foo | | | update foo set msg = 'maybe' where id = 3; (8 rows) In this case, pid 6484 updated the row first but did not commit, then 6508 tried to update the same row and naturally blocked. I understand at a very basic level why this would block, so that's not the spirit of my questions. I also understand each transaction holds its acquired locks until the end of its transaction. But I'd like to better understand the pg_locks view and pgsql locking terms. First, I don't see rhyme or reason in the transactionid locks. Can someone explain why 6508 has a transactionid ExclusiveLock, but now is waiting on a transactionid ShareLock? That seems unintuitive. It would seem that if you hold a more exclusive lock, getting a less exclusive lock would not be a problem. Is there rhyme or reason for this? From the docs, I'd guess the ExclusiveLock is given to block select for updates that might want to grab that row after 6508 already got in line, and that it is grabbing the ShareLock for... what? Second, 6508, the *blocked* process, also holds a tuple ExclusiveLock on the page and tuple of foo on which it is actually blocked. Again, is there rhyme or reason I'm missing? Third, what is it that causes values to show up in the page and tuple columns? Finally, both processes, the blocked and the blocking, hold relation RowExclusiveLocks, which again, hardly seems intuitive. It would seem that the blocked process would not have that lock until it actually had access to update the row in question. The docs do not say RowExclusiveLock conflicts with RowExclusiveLock, but I guess I'm expecting to to that conflict since one process is blocking on the other for updating the same row. TIA. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] killing autovac
Can I kill -SIGINT autovac in 8.1.2 without taking down all the other backends? Thanks, Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] 8.1.2 postmaster died
One of our 8.1.2 postmasters on HPUX 11.23 ia64 just received a SIGKILL signal from unknown origins. After reviewing all command history files for the DBA and root, I do not believe anyone manually sent it, and we have no scripts etc that would do that, at least that we can find or imagine. The machine had ample RAM available. Any ideas where could this SIGKILL would have come from? Thanks, Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Table and Field namestyle best practices?
On Wednesday November 8 2006 11:31 am, novnov wrote: Yes, I've already pretty much decided to use lowercase for all namestyles, I mentioned that in the first post. Using lowercase invokes a set of other issues, which I'm asking for options on...namely, conventions like org_id, and emp_org_id, or simply using org_id as the fk pointer. Not sure there is a best way, but I prefer org_id due to brevity and no concern for quoting. Your emp_org_id is nice for views because you don't have to disambiguate columns. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] 8.1.2 locking issues
We are having locking issues in 8.1.2 on HP 11.23 ia64. I'm trying to better understand how to conclusively identify who is waiting on who and why. We have a series of select for updates on our 'sessions' table. One of those queries is stuck waiting for a transactionid locktype ShareLock. How can I tell what it's actually waiting for? TIA. Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] DROP INDEX performance/locking
We're taking some politically expensive downtime to drop an index on a very large, heavily used table because to do it while live blocks too many users for too long, requiring tens of seconds or more on a system doing 200 transactions/second. That's due to the fact that nearly every user interaction touches this particular table. It seems like one ought to be able to drop an index in a much shorter amount of time and/or not block queries while it's dropped. But I don't know the code well enough to say how hard this would be. Would a faster drop or non-blocking drop be worthy of a todo item? Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] failing 8.1.4 autovacuum
I have an 8.1.4 autovac process running on HP-UX 11.23 IA64 and repeatedly failing with the following error: ERROR: failed to re-find parent key in audit_idx1 Will a reindex or drop index make this problem go away? Is there anything I can do to help identify the underlying issue? (This is an intensely used production system, so I have very limited room to operate.) TIA. Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] failing 8.1.4 autovacuum
On Monday October 23 2006 4:03 pm, Ed L. wrote: I have an 8.1.4 autovac process running on HP-UX 11.23 IA64 and repeatedly failing with the following error: ERROR: failed to re-find parent key in audit_idx1 Will a reindex or drop index make this problem go away? BTW, it'd be nice if autovac didn't simply abort and restart with the prior tables that were successfully vacuumed, but rather was able to continue on after the error to the next table. As it is, it starves the others but autovac just keeps banging it's head against the wall, consuming lots of I/O and accomplishing little. Maybe that's part of the 8.2 fixes I haven't kept up with? Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] troubleshooting 8.1.2
On Tuesday July 11 2006 3:16 pm, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: We are wondering if our swap space was too small, and when the swap reservation failed, the OS was sending SIGINT?? You'd have to check your OS documentation ... I thought HPUX would just return ENOMEM to brk() for such cases. It doesn't do memory overcommit does it? ENOMEM is correct for our brk(), too. We're running with psuedoswap, but I guess our swapspace was too small, and appears to be what we ran into. The SIGINT is still a mystery. Our truss output for one of these SIGINTs is at the bottom of this message, for what its worth. BTW, here's a conversation of possible interest that conflicts with advice I've heard here of keeping shared_buffers small and letting the OS do all the caching. http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1042336 Their argument appears to be that there are HPUX kernel inefficiencies for OS caches larger than 1.5gb. You once argued that it would be unreasonable to expect user-space shared memory to be any more efficient than the kernel cache. I don't know one way or the other, and solid benchmarking that simulates our loads appears troublesome. I guess I could write a little C program to measure shared memory random access times as the size of the cache grows... Anyway, here's the truss output: ( Attached to process 20787 (postmaster -D /users/...) [64-bit] ) select(7, 0x9fffe670, NULL, NULL, 0x9fffe640) [sleeping] Received signal 2, SIGINT, in select(), [caught], no siginfo sigprocmask(SIG_SETMASK, 0x600708c0, NULL) = 0 gettimeofday(0x9fff9460, NULL) = 0 stat(/usr/lib/tztab, 0x9fff9300) = 0 open(/usr/lib/tztab, O_RDONLY|0x800, 01210) = 9 mmap(NULL, 13197, PROT_READ, MAP_PRIVATE, 9, 0) = 0x9fffbb14c0 00 close(9) = 0 write(2, 2 0 0 6 - 0 7 - 1 1 1 3 : 5 5 .., 76) = 76 kill(20793, SIGUSR2) = 0 kill(20794, SIGQUIT) = 0 ... Ed ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Log actual params for prepared queries: TO-DO item?
We'd like to attempt some log replay to simulate real loads, but in 8.1.2, it appears the formal parameters are logged ('$') instead of the actuals for prepared queries, e.g.: EXECUTE unnamed [PREPARE: UPDATE sessions SET a_session = $1 WHERE id = $2] Thoughts on making this a to-do item? Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] troubleshooting 8.1.2
We have 4 8.1.2 cluster running on an HP-UX 11.23 Itanium, repeatedly dying with the following log message: 2006-07-11 12:52:27 EDT [21582]LOG: received fast shutdown request 2006-07-11 12:52:27 EDT [21591]LOG: shutting down 2006-07-11 12:52:27 EDT [21591]LOG: database system is shut down 2006-07-11 12:52:27 EDT [21584]LOG: logger shutting down We can't figure out why it is shutting down. Nobody here is sending the signal. We don't have any cron jobs doing that sort of thing. We've also seen out of memory errors when this first started happening, though glance had not shown GBL_MEM_UTIL above 90% (with OS buffer cache max/min percents at 10%/3%). The box has 64gb of RAM, so that would seem to mean there was ~6GB of RAM available when it got the out of memory errors. Just in case, we shutdown several clusters, and restarted them, and now even with plentiful memory, they're dying with the same message. Any ideas? Ed ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] troubleshooting 8.1.2
On Tuesday July 11 2006 1:17 pm, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: We have 4 8.1.2 cluster running on an HP-UX 11.23 Itanium, repeatedly dying with the following log message: 2006-07-11 12:52:27 EDT [21582]LOG: received fast shutdown request *Something* is sending SIGINT to the postmaster --- it's simply not possible to reach that elog call any other way. How are you launching the postmaster? If from a terminal window, are you sure it's entirely disconnected from the terminal's process group? If not, typing control-C in that window could SIGINT the postmaster. We use a shell function to start the postmaster: dbstart() { pg_ctl start -D $PGDATA -m smart -o -i -p $PGPORT -p postmaster } We are wondering if our swap space was too small, and when the swap reservation failed, the OS was sending SIGINT?? Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Why won't it index scan?
On Tuesday May 23 2006 4:55 pm, Jim C. Nasby wrote: Well, I did find one reason not to go ape with this: the number of pages analyzed scales with the number of buckets, so doubling the statistics target will roughly double the ANALYZE time for any table over 6000 pages (though the effect isn't linear, see below). There is a small increase in time for a small table, but I doubt it's enough for anyone to care: Are you accounting for your well-primed OS and DB caches? I'd think a more realistic test would clear those of the target table between measurements. Ed ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] pg_dump index/constraint creation order
On Monday May 15 2006 11:14 am, Vivek Khera wrote: On May 14, 2006, at 12:27 AM, Ed L. wrote: While watching a 9-hour 60GB network load from 7.4.6 pg_dump into 8.1.2, I noticed the order in which indices and constraints are created appears to be their creation order. If you use the 8.1.2 pg_dump to make the dump from your 7.4.6 DB, what is the order like? I suspect it would be more to what you're expecting. In general, you should use the pg_dump corresponding to the version into which you are loading for best results. 8.1.2 has same problem (try attached script to see). Ed test.sh Description: application/shellscript ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Why won't it index scan?
Can someone help me understand why the 8.1.2 query below is using a seq scan instead of an index scan? All relevant columns appear to be indexed and all tables vacuum analyzed. $ psql -c analyze verbose patient INFO: analyzing public.patient INFO: patient: scanned 3000 of 3353 pages, containing 117685 live rows and 5471 dead rows; 3000 rows in sample, 131533 estimated total rows ANALYZE $ psql -c select count(1) from patient count 131661 (1 row) $ psql -c analyze verbose visit INFO: analyzing public.visit INFO: visit: scanned 3000 of 19985 pages, containing 58520 live rows and 7045 dead rows; 3000 rows in sample, 389841 estimated total rows ANALYZE $ psql -c select count(1) from visit count 389102 (1 row) $ psql -c explain analyze select * from visit inner join patient on patient.key = visit.patient_key where nursestation_key = '40'; QUERY PLAN --- Merge Join (cost=27724.37..28457.01 rows=4956 width=421) (actual time=1819.993..2004.802 rows=6 loops=1) Merge Cond: (outer.patient_key = inner.key) - Sort (cost=11859.31..11871.70 rows=4956 width=209) (actual time=0.416..0.426 rows=6 loops=1) Sort Key: visit.patient_key - Bitmap Heap Scan on visit (cost=69.35..11555.14 rows=4956 width=209) (actual time=0.187..0.245 rows=6 loops=1) Recheck Cond: (nursestation_key = 40) - Bitmap Index Scan on idx_visit_nursestation_key (cost=0.00..69.35 rows=4956 width=0) (actual time=0.158..0.158 rows=6 loops=1) Index Cond: (nursestation_key = 40) - Sort (cost=15865.05..16194.21 rows=131661 width=212) (actual time=1768.501..1856.334 rows=61954 loops=1) Sort Key: patient.key - Seq Scan on patient (cost=0.00..4669.61 rows=131661 width=212) (actual time=0.010..355.299 rows=131661 loops=1) Total runtime: 2046.323 ms (12 rows) Table public.patient Column |Type | Modifiers ---+-+--- key | integer | not null default nextval('patient_key_seq'::regclass) ... Indexes: pk_patient PRIMARY KEY, btree (key) ... Table public.visit Column |Type | Modifiers ---+-+- patient_key | integer | not null nursestation_key | integer | ... Indexes: idx_visit_nursestation_key btree (nursestation_key) idx_visit_patient_key btree (patient_key) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly