Re: [HACKERS] Logging of PAM Authentication Failure
On 5/27/13, Craig Ringer cr...@2ndquadrant.com wrote: We were just talking about things we'd like to do in wire protocol 4. Allowing multi-stage authentication has come up repeatedly and should perhaps go on that list. The most obvious case being ident auth failed, demand md5. I'd like to use LDAP with pg_ident Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Tue, May 28, 2013 at 01:32:53PM +0800, Craig Ringer wrote: On 05/11/2013 03:25 AM, Robert Haas wrote: Not really. We could potentially fix it by extending the wire protocol to allow the server to respond to the client's startup packet with a further challenge, and extend libpq to report that challenge back to the user and allow sending a response. But that would break on-the-wire compatibility, which we haven't done in a good 10 years, and certainly wouldn't be worthwhile just for this. We were just talking about things we'd like to do in wire protocol 4. Allowing multi-stage authentication has come up repeatedly and should perhaps go on that list. The most obvious case being ident auth failed, demand md5. +1 The configuration would need to be thought though, as no fixed ordering could cover all cases. Maybe lines like local all postgres peer,md5 in pg_hba.conf would be the way to do this, where the list gets evaluated in the order it's read. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] plpgsql redesign (related to plpgsql check function)
Hello all I am searching way how to push our plpgsql_check_function to upstream. One possibility is redesign of plpgsql architecture. Now, we have two stages - compilation and execution, and almost all compilation logic is in gram file. If I understand to this design well, then a reason for it is a possibility to raise user friendly error messages with location specification. Now, we are able to raise messages with location info outside gram file, so we can little bit cleanup architecture by dividing current compilation to parsing and compilation stage (recursive). A new compilation stage can be good place for placing current checks and deep (sql semantic) check. This redesign contains lot of work, so I would to know all opinions and I would to know, if this idea is acceptable. Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Tue, May 28, 2013 at 2:32 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 05/11/2013 03:25 AM, Robert Haas wrote: Not really. We could potentially fix it by extending the wire protocol to allow the server to respond to the client's startup packet with a further challenge, and extend libpq to report that challenge back to the user and allow sending a response. But that would break on-the-wire compatibility, which we haven't done in a good 10 years, and certainly wouldn't be worthwhile just for this. We were just talking about things we'd like to do in wire protocol 4. Allowing multi-stage authentication has come up repeatedly and should perhaps go on that list. The most obvious case being ident auth failed, demand md5. I wonder what you think about continuing to use the already established connection to the server while you move onto perform authentication using next method in the list. Earlier in this thread, I had proposed to make changes to PGconnectPoll() to introduce an additional connection state which is kind of an intermediate state in the authentication sequence. For example, server might ask for a password (md5, password methods) and client might want to send the password over the existing connection by leveraging this new connection state. This is unlike what we do, for example, in psql, where we drop the connection (upon CONNECTION_BAD due to password required), get password using a prompt and then create a new connection with password included in the request. -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Tue, May 28, 2013 at 5:04 PM, Amit Langote amitlangot...@gmail.com wrote: On Tue, May 28, 2013 at 2:32 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 05/11/2013 03:25 AM, Robert Haas wrote: Not really. We could potentially fix it by extending the wire protocol to allow the server to respond to the client's startup packet with a further challenge, and extend libpq to report that challenge back to the user and allow sending a response. But that would break on-the-wire compatibility, which we haven't done in a good 10 years, and certainly wouldn't be worthwhile just for this. We were just talking about things we'd like to do in wire protocol 4. Allowing multi-stage authentication has come up repeatedly and should perhaps go on that list. The most obvious case being ident auth failed, demand md5. I wonder what you think about continuing to use the already established connection to the server while you move onto perform authentication using next method in the list. Earlier in this thread, I had proposed to make changes to PGconnectPoll() to introduce an additional connection state which is kind of an intermediate state in the authentication sequence. For example, server might ask for a password (md5, password methods) and client might want to send the password over the existing connection by leveraging this new connection state. This is unlike what we do, for example, in psql, where we drop the connection (upon CONNECTION_BAD due to password required), get password using a prompt and then create a new connection with password included in the request. -- Amit Langote Sorry, *PQconnectPoll() -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --throttle to pgbench (submission 3)
The use case of the option is to be able to generate a continuous gentle load for functional tests, eg in a practice session with students or for testing features on a laptop. If you add this to https://commitfest.postgresql.org/action/commitfest_view?id=18 I'll review it next month. I have a lot of use cases for a pgbench that doesn't just run at 100% all the time. As do I - in particular, if time permits I'll merge this patch into my working copy of pgbench so I can find the steady-state transaction rate where BDR replication's lag is stable and doesn't increase continually. Right now I don't really have any way of doing that, only measuring how long it takes to catch up once the test run completes. You can try to use and improve the --progress option in another patch submission which shows how things are going. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --throttle to pgbench (submission 3)
On 05/28/2013 04:13 PM, Fabien COELHO wrote: You can try to use and improve the --progress option in another patch submission which shows how things are going. That'll certainly be useful, but won't solve this issue. The thing is that with asynchronous replication you need to know how long it takes until all nodes are back in sync, with no replication lag. I can probably do it with a custom pgbench script, but I'm tempted to add support for timing that part separately with a wait command to run at the end of the benchmark. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
The reasons are: performance, storage and frustration. I think the frustration comes from the fact that unsigned integers are universally available, except in PostgreSQL. I work with a really complex system, with many moving parts, and Postgres really is one of the components that causes the least trouble (compared to other opens-source and closed-sourced systems, which I shall leave unnamed), except for the unsigned integers. Let me give you few examples: 1. SMALLINT Probably the most popular unsigned short int on the planet: IP port number. I had to store some network traffic data in DB; I instinctively started to prototyping it like this: CREATE TABLE packets (addr INET, port SMALLINT, ... ); Of course it failed quickly and I had to bump the size to INTEGER. No real harm here, as the 2 bytes will probably go into some padding anyway, but somehow it feels wrong. 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: Either we'll increase budged for storage, or we need to touch every bit of the system. 3 .BIGINT There is no escape from bigint. Numeric (or TEXT!) is the only thing that can keep uint64, but when you have 10^9 and more records, and you need to do some arithmetic on it, numeric it's just too slow. We use uint64 all across our system as unique event identifier. It works fine, it's fast, and it's very convenient. Passing uint64 around, storing it, looking it up. We use it everywhere, including UI and log files. So once I decided to use BIGINT to store it, I had to guard all the inputs and outputs and make sure it is handled correctly. Or so I though. It turned out that some guys from different department are parsing some logs with perl parser and they store it in DB. They choose to store the uint64 id as TEXT. They probably tried BIGINT and failed and decided that - since they have low volume and they are not doing any arithmetics - to store it as TEXT. And now someone came up with an idea to join one table with another, bigint with text. I did it. Initially I wrote function that converted the text to numeric, then rotated it around 2^64 if necessary. It was too slow. Too slow for something that should be a simple reinterpretation of data. Eventually I ended up writing a C function, that first scanf( %llu)'d the text into uint64_t, and then PG_RETURN_INT64-ed the uint64_t value. Works fast, but operations hate for increasing the complexity of DB deployment. --- I know some cynical people that love this kind of problems, they feel that the constant struggle is what keeps them employed :) But I'm ready to use my private time to solve it once and for all. I'm afraid that implementing uints as and extension would introduce some performance penalty (I may be wrong). I'm also afraid that with the extension I'd be left on my own maintaining it forever. While if this could go into the core product, it would live forever. As for the POLA violation: programmers experienced with statically typed languages shouldn't have problems dealing with all the issues surrounding signed/unsigned integers (like the ones described here: http://c-faq.com/expr/preservingrules.html). Others don't need to use them. Maciek On 27 May 2013 16:16, Tom Lane t...@sss.pgh.pa.us wrote: Maciej Gajewski maciej.gajews...@gmail.com writes: The lack of unsigned integer types is one of the biggest sources of grief in my daily work with pgsql. Before I go and start hacking, I'd like to discuss few points: 1. Is there a strong objection against merging this kind of patch? Basically, there is zero chance this will happen unless you can find a way of fitting them into the numeric promotion hierarchy that doesn't break a lot of existing applications. We have looked at this more than once, if memory serves, and failed to come up with a workable design that didn't seem to violate the POLA. 2. How/if should the behaviour of numeric literals change? The minimalistic solution is: it shouldn't, literals should be assumed signed by default. More complex solution could involve using C-style suffix ('123456u'). Well, if you don't do that, there is no need for you to merge anything: you can build unsigned types as an external extension if they aren't affecting the core parser's behavior. As long as it's external, you don't need to satisfy anybody else's idea of what reasonable behavior is ... regards, tom lane -- Sent via
[HACKERS] converting numeric to string in postgres code
Hi, while hacking on some Postgres code I've found a problem. I need to convert numeric to string. I've got datum with numeric inside, so I'm getting it like: Numeric *numeric = DatumGetNumeric(d); but later I need to have string (most probably: const char *). I've found a couple of different ways for doing that, but I'm not aware of side effects. Which function/macro should I use? thanks, Szymon
Re: [HACKERS] converting numeric to string in postgres code
Hello 2013/5/28 Szymon Guz mabew...@gmail.com: Hi, while hacking on some Postgres code I've found a problem. I need to convert numeric to string. I've got datum with numeric inside, so I'm getting it like: Numeric *numeric = DatumGetNumeric(d); but later I need to have string (most probably: const char *). I've found a couple of different ways for doing that, but I'm not aware of side effects. Which function/macro should I use? There is a numeric_out function, you can use it or look on their source code result = DatumGetCString(DirectFunctionCall1(numeric_out, d)); Regards Pavel thanks, Szymon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] converting numeric to string in postgres code
On 28 May 2013 12:07, Pavel Stehule pavel.steh...@gmail.com wrote: Hello 2013/5/28 Szymon Guz mabew...@gmail.com: Hi, while hacking on some Postgres code I've found a problem. I need to convert numeric to string. I've got datum with numeric inside, so I'm getting it like: Numeric *numeric = DatumGetNumeric(d); but later I need to have string (most probably: const char *). I've found a couple of different ways for doing that, but I'm not aware of side effects. Which function/macro should I use? There is a numeric_out function, you can use it or look on their source code result = DatumGetCString(DirectFunctionCall1(numeric_out, d)); Thanks.
Re: [HACKERS] commit fest schedule for 9.4
On 28.05.2013 01:12, Craig Ringer wrote: On 05/16/2013 01:44 AM, Josh Berkus wrote: I'll also say: * we need to assign CF managers at least 2 weeks in advance of each CF * we need to replace them if they get too busy to follow-through, * and the last CF needs two managers. Strong +1 on both of those. I tried to pick up a CF that was already totally off the rails most of the way through, then had a bunch of other work come in. Add inexperience with the process and, well, it didn't go well. I see nothing but advantages in having more than one person involved. Shared responsibility is no-one's responsibility. If we are to have multiple CF managers, I think it would be good to have one who's mainly responsible, and the second one's job is to nag the first manager if nothing happens, and quickly take over if necessary. Ie. a hot standby arrangement, rather than two equal CF managers. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql redesign (related to plpgsql check function)
On 28.05.2013 11:00, Pavel Stehule wrote: Hello all I am searching way how to push our plpgsql_check_function to upstream. One possibility is redesign of plpgsql architecture. Now, we have two stages - compilation and execution, and almost all compilation logic is in gram file. If I understand to this design well, then a reason for it is a possibility to raise user friendly error messages with location specification. Now, we are able to raise messages with location info outside gram file, so we can little bit cleanup architecture by dividing current compilation to parsing and compilation stage (recursive). A new compilation stage can be good place for placing current checks and deep (sql semantic) check. This redesign contains lot of work, so I would to know all opinions and I would to know, if this idea is acceptable. +1 for a redesign along those lines. I'm not sure what the rationale behind the current design is. I'd guess it has just grown that way over time really, without any grand design. While we're at it, it would be nice if the new design would make it easier to add an optimization step. I'm just waving hands here, I don't know what optimizations we might want to do, but maybe it would make sense to have a new intermediate language representation that would be executed by the executor, to replace the PLpgSQL_stmt_* structs. OTOH, perhaps it's better to not conflate that with the redesign of the grammar / compiler part, and keep the executor and PLpgSQL_stmt* structs unchanged for now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql redesign (related to plpgsql check function)
2013/5/28 Heikki Linnakangas hlinnakan...@vmware.com: On 28.05.2013 11:00, Pavel Stehule wrote: Hello all I am searching way how to push our plpgsql_check_function to upstream. One possibility is redesign of plpgsql architecture. Now, we have two stages - compilation and execution, and almost all compilation logic is in gram file. If I understand to this design well, then a reason for it is a possibility to raise user friendly error messages with location specification. Now, we are able to raise messages with location info outside gram file, so we can little bit cleanup architecture by dividing current compilation to parsing and compilation stage (recursive). A new compilation stage can be good place for placing current checks and deep (sql semantic) check. This redesign contains lot of work, so I would to know all opinions and I would to know, if this idea is acceptable. +1 for a redesign along those lines. I'm not sure what the rationale behind the current design is. I'd guess it has just grown that way over time really, without any grand design. While we're at it, it would be nice if the new design would make it easier to add an optimization step. I'm just waving hands here, I don't know what optimizations we might want to do, but maybe it would make sense to have a new intermediate language representation that would be executed by the executor, to replace the PLpgSQL_stmt_* structs. OTOH, perhaps it's better to not conflate that with the redesign of the grammar / compiler part, and keep the executor and PLpgSQL_stmt* structs unchanged for now. I played with some simple intermediate language - see https://github.com/okbob/plpsm0 but without JIT it is significantly slower than current design :-( Regards Pavel - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --throttle to pgbench (submission 3)
You can try to use and improve the --progress option in another patch submission which shows how things are going. That'll certainly be useful, but won't solve this issue. The thing is that with asynchronous replication you need to know how long it takes until all nodes are back in sync, with no replication lag. I can probably do it with a custom pgbench script, but I'm tempted to add support for timing that part separately with a wait command to run at the end of the benchmark. ISTM that a separate process not related to pgbench should try to monitor the master-slave async lag, as it is an interesting information anyway... However I'm not sure that pg_stat_replication currently has the necessary information on either side to measure the lag (in time transactions, but how do I know when a transaction was committed? or number of transactions?). -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] storing plpython global pointer
Hi, I need to store a global pointer for plpython usage. This is a PyObject* which can be initialized per session I think, as we have to deal with Python 2 and Python 3. This pointer points to a Python constructor of Python's Decimal type, taken from python stdlib. I've got working code, however loading python module each time there is Numeric argument in plpython function is not very efficient, so I'd like to do it once and keep this somewhere. This has no side effects as this is a pointer to a pure function. Where should I keep such a pointer? thanks, Szymon
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
I just took time to inspect our contribs, USE_PGXS is not supported by all of them atm because of SHLIB_PREREQS (it used submake) I have a patch pending here to fix that. Attached patch fix SHLIB_PREREQS when building with USE_PGXS commit 19e231b introduced SHLIB_PREREQS but failed to port that to PGXS build. The issue is that submake-* can not be built with PGXS, in this case they must check that expected files are present (and installed). Maybe it is better to only check if they have been built ? This fix the build of dblink and postgres_fdw (make USE_PGXS=1) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 89e39d2..1b13f85 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -415,13 +415,24 @@ libpq_pgport = -L$(top_builddir)/src/port -lpgport \ -L$(top_builddir)/src/common -lpgcommon $(libpq) endif - +# If PGXS is not defined, builds as usual: +# build dependancies required by SHLIB_PREREQS +# If the build is with PGXS, then any requirement is supposed to be already +# build and we just take care that the expected files exist +ifndef PGXS submake-libpq: $(MAKE) -C $(libpq_builddir) all +else +submake-libpq: $(libdir)/libpq.so ; +endif +ifndef PGXS submake-libpgport: $(MAKE) -C $(top_builddir)/src/port all $(MAKE) -C $(top_builddir)/src/common all +else +submake-libpgport: $(libdir)/libpgport.a $(libdir)/libpgcommon.a ; +endif .PHONY: submake-libpq submake-libpgport signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] storing plpython global pointer
On 28/05/13 14:04, Szymon Guz wrote: Hi, I need to store a global pointer for plpython usage. This is a PyObject* which can be initialized per session I think Where should I keep such a pointer? Hi, you probably could use a global variable, similar to PLy_interp_globals that's defined in plpy_main.c. Another method would be to expose the Decimal constructor in the plpy module. You could modify plpy_plpymodule.c to import decimal and expose the Decimal constructor as plpy.Decimal. Best, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
Once all our contribs can build with USE_PGXS I fix the VPATH. The last step is interesting: installcheck/REGRESS. For this one, if I can know exactly what's required (for debian build for example), then I can also fix this target. There is a hack to link the regression data files from the srcdir to the builddir when doing 'make VPATH'. but it failed when used in conjunction with USE_PGXS and out-of-tree build of an extension. Issue is the absence of the data/ directory in the builddir. Attached patch fix that. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index bbcfe04..e8ff584 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -263,6 +263,7 @@ test_files_build := $(patsubst $(srcdir)/%, $(abs_builddir)/%, $(test_files_src) all: $(test_files_build) $(test_files_build): $(abs_builddir)/%: $(srcdir)/% + mkdir -p $(dir $@) ln -s $ $@ endif # VPATH signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Unsigned integer types
On 05/28/2013 05:17 AM, Maciej Gajewski wrote: I'm afraid that implementing uints as and extension would introduce some performance penalty (I may be wrong). You are. I'm also afraid that with the extension I'd be left on my own maintaining it forever. While if this could go into the core product, it would live forever. This is an argument against ever doing anything as an extension. You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated: Basically, there is zero chance this will happen unless you can find a way of fitting them into the numeric promotion hierarchy that doesn't break a lot of existing applications. We have looked at this more than once, if memory serves, and failed to come up with a workable design that didn't seem to violate the POLA. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing plpython global pointer
On 28 May 2013 14:15, Jan Urbański wulc...@wulczer.org wrote: On 28/05/13 14:04, Szymon Guz wrote: Hi, I need to store a global pointer for plpython usage. This is a PyObject* which can be initialized per session I think Where should I keep such a pointer? Hi, you probably could use a global variable, similar to PLy_interp_globals that's defined in plpy_main.c. Another method would be to expose the Decimal constructor in the plpy module. You could modify plpy_plpymodule.c to import decimal and expose the Decimal constructor as plpy.Decimal. Best, Jan I think I'd rather go with the first solution, as this function should not be accessible inside the plpython function. That's what I was thinking about as well, but I wasn't sure. thanks, Szymon
Re: [HACKERS] Extent Locks
* Craig Ringer (cr...@2ndquadrant.com) wrote: On 05/17/2013 11:38 AM, Robert Haas wrote: maybe with a bit of modest pre-extension. When it comes to pre-extension, is it realistic to get a count of backends waiting on the lock and extend the relation by (say) 2x the number of waiting backends? Having the process which has the lock do more work before releasing it, and having the other processes realize that there is room available after blocking on the lock (and not trying to extend the relation themselves..), might help. One concern that came up in Ottawa is over autovacuum coming along and discovering empty pages at the end of the relation and deciding to try and truncate it. I'm not convinced that would happen due to the locks involved but if we actually extend the relation by enough that the individual processes can continue writing for a while before another extension is needed, then perhaps it could. On the other hand, I do feel like people are worried about over-extending a relation and wasting disk space- but with the way that vacuum can clean up pages at the end, that would only be a temporary situation anyway. If it's possible this would avoid the need to attempt any recency-of-last-extension based preallocation with the associated problem of how to store and access the last-extended time efficiently, while still hopefully reducing contention on the relation extension lock and without delaying the backend doing the extension too much more. I do like the idea of getting an idea of how many blocks are being asked for, based on how many other backends are trying to write, but I've been thinking a simple algorithm might also work well, eg: alloc_size = 1 page extend_time = 0 while(writing) if(blocked and extend_time 5s) alloc_size *= 2 extend_start_time = now() extend(alloc_size) extend_time = now() - extend_start_time Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
Once all our contribs can build with USE_PGXS I fix the VPATH. Attached patch set VPATH for out-of-tree extension builds If the makefile is not in the current directory (where we launch 'make') then assume we are building out-of-src tree and set the VPATH to the directory of the *first* makefile... Thus it fixes: mkdir /tmp/a cd /tmp/a make -f extension_src/Makefile USE_PGXS=1 Note that the patch fix things. Still I am not really happy with the rule to get the srcdir. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index e8ff584..64732ff 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -61,9 +61,18 @@ ifdef PGXS top_builddir := $(dir $(PGXS))../.. include $(top_builddir)/src/Makefile.global +# If Makefile is not in current directory we are building the extension with +# VPATH so we set the variable here +# XXX what about top_srcdir ? +ifeq ($(CURDIR),$(dir $(firstword $(MAKEFILE_LIST top_srcdir = $(top_builddir) srcdir = . VPATH = +else +top_srcdir = $(top_builddir) +srcdir = $(dir $(firstword $(MAKEFILE_LIST))) +VPATH = $(dir $(firstword $(MAKEFILE_LIST))) +endif # These might be set in Makefile.global, but if they were not found # during the build of PostgreSQL, supply default values so that users signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
Le mardi 28 mai 2013 14:16:38, Cédric Villemain a écrit : Once all our contribs can build with USE_PGXS I fix the VPATH. The last step is interesting: installcheck/REGRESS. For this one, if I can know exactly what's required (for debian build for example), then I can also fix this target. There is a hack to link the regression data files from the srcdir to the builddir when doing 'make VPATH'. but it failed when used in conjunction with USE_PGXS and out-of-tree build of an extension. Issue is the absence of the data/ directory in the builddir. Attached patch fix that. use $(MKDIR_P) instead of mkdir -p -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index bbcfe04..e8ff584 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -263,6 +263,7 @@ test_files_build := $(patsubst $(srcdir)/%, $(abs_builddir)/%, $(test_files_src) all: $(test_files_build) $(test_files_build): $(abs_builddir)/%: $(srcdir)/% + $(MKDIR_P) '$(dir $@)' ln -s $ $@ endif # VPATH signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install
Le samedi 25 mai 2013 16:41:24, Cédric Villemain a écrit : If it seems to be on the right way, I'll keep fixing EXTENSION building with VPATH. I haven't tried the patch, but let me just say that Debian (and apt.postgresql.org) would very much like the VPATH situation getting improved. At the moment we seem to have to invent a new build recipe for every extension around. Attached patch adds support for VPATH with USE_PGXS It just change recipe for install: in pgxs.mk. I am unsure automatic variables can be used this way with all UNIX variation of make... I also didn't touch MODULE and PROGRAM (yet) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index 31746f3..2575855 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -121,33 +121,40 @@ all: all-lib endif # MODULE_big -install: all installdirs -ifneq (,$(EXTENSION)) - $(INSTALL_DATA) $(addprefix $(srcdir)/, $(addsuffix .control, $(EXTENSION))) '$(DESTDIR)$(datadir)/extension/' -endif # EXTENSION -ifneq (,$(DATA)$(DATA_built)) - $(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) $(DATA_built) '$(DESTDIR)$(datadir)/$(datamoduledir)/' -endif # DATA -ifneq (,$(DATA_TSEARCH)) - $(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA_TSEARCH)) '$(DESTDIR)$(datadir)/tsearch_data/' -endif # DATA_TSEARCH +install: all installdirs installcontrol installdata installdatatsearch installdocs installscripts ifdef MODULES $(INSTALL_SHLIB) $(addsuffix $(DLSUFFIX), $(MODULES)) '$(DESTDIR)$(pkglibdir)/' endif # MODULES +ifdef PROGRAM + $(INSTALL_PROGRAM) $(PROGRAM)$(X) '$(DESTDIR)$(bindir)' +endif # PROGRAM + +installcontrol: $(addsuffix .control, $(EXTENSION)) +ifneq (,$(EXTENSION)) + $(INSTALL_DATA) $ '$(DESTDIR)$(datadir)/extension/' +endif + +installdata: $(DATA) $(DATA_built) +ifneq (,$(DATA)$(DATA_built)) + $(INSTALL_DATA) $^ '$(DESTDIR)$(datadir)/$(datamoduledir)/' +endif + +installdatatsearch: $(DATA_TSEARCH) +ifneq (,$(DATA_TSEARCH)) + $(INSTALL_DATA) $^ '$(DESTDIR)$(datadir)/tsearch_data/' +endif + +installdocs: $(DOCS) ifdef DOCS ifdef docdir - $(INSTALL_DATA) $(addprefix $(srcdir)/, $(DOCS)) '$(DESTDIR)$(docdir)/$(docmoduledir)/' + $(INSTALL_DATA) $^ '$(DESTDIR)$(docdir)/$(docmoduledir)/' endif # docdir endif # DOCS -ifdef PROGRAM - $(INSTALL_PROGRAM) $(PROGRAM)$(X) '$(DESTDIR)$(bindir)' -endif # PROGRAM + +installscripts: $(SCRIPTS) $(SCRIPTS_built) ifdef SCRIPTS - $(INSTALL_SCRIPT) $(addprefix $(srcdir)/, $(SCRIPTS)) '$(DESTDIR)$(bindir)/' + $(INSTALL_SCRIPT) $^ '$(DESTDIR)$(bindir)/' endif # SCRIPTS -ifdef SCRIPTS_built - $(INSTALL_SCRIPT) $(SCRIPTS_built) '$(DESTDIR)$(bindir)/' -endif # SCRIPTS_built ifdef MODULE_big install: install-lib signature.asc Description: This is a digitally signed message part.
[HACKERS] preserving forensic information when we freeze
Various people, including at least Heikki, Andres, and myself, have proposed various schemes for avoiding freezing that amount to doing it sooner, when we're already writing WAL anyway, or at least when the buffer is already dirty anyway, or at least while the buffer is already in shared_buffers anyway. Various people, including at least Tom and Andres, have raised the point that this would lose possibly-useful forensic information that they have in the past found to be of tangible value in previous debugging of databases that have somehow gotten messed up. I don't know who originally proposed it, but I've had many conversations about how we could address this concern: instead of replacing xmin when we freeze, just set an infomask bit that means xmin is frozen and leave the old, literal xmin in place. FrozenTransactionId would still exist and still be understood, of course, but new freezing operations wouldn't use it. I have attempted to implement this. Trouble is, we're out of infomask bits. Using an infomask2 bit might work but we don't have many of them left either, so it's worth casting about a bit for a better solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for this purpose, but I think we're better off trying to reclaim those bits in a future release. Exactly how to do that is a topic for another email, but I believe it's very possible. What I'd like to propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to indicate that xmin is frozen. This bit pattern isn't used for anything else, so there's no confusion possible with existing data already on disk, but it's necessary to audit all code that checks HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done this, and there's little enough of it that it seems pretty easy to handle. A somewhat larger problem is that this requires auditing every place that looks at a tuple xmin and deciding whether any changes are needed to handle the possibility that the tuple may be frozen even though xmin != FrozenTransactionId. This is a somewhat more significant change, but it doesn't seem to be too bad. But there are a couple of cases that are tricky enough that they seem worth expounding upon: - When we follow HOT chains, we determine where the HOT chain ends by matching the xmax of each tuple with the xmin of the next tuple. If they don't match, we conclude that the HOT chain has ended. I initially thought this logic might be buggy even as things stand today if the latest tuple in the chain is frozen, but as Andres pointed out to me, that's not so. If the newest tuple in the chain is all-visible (the earliest point at which we can theoretically freeze it), all earlier tuples are dead altogether, and heap_page_prune() is always called after locking the buffer and before freezing, so any tuple we freeze must be the first in its HOT chain. For the same reason, this logic doesn't need any adjustment for the new freezing system: it's never looking at anything old enough to be frozen in the first place. - Various procedural languages use the combination of TID and XMIN to determine whether a function needs to be recompiled. Although the possibility of this doing the wrong thing seems quite remote, it's not obvious to me why it's theoretically correct even as things stand today. Suppose that previously-frozen tuple is vacuumed away and another tuple is placed at the same TID and then frozen. Then, we check whether the cache is still valid and, behold, it is. This would actually get better with this patch, since it wouldn't be enough merely for the old and new tuples to both be frozen; they'd have to have had the same XID prior to freezing. I think that could only happen if a backend sticks around for at least 2^32 transactions, but I don't know what would prevent it in that case. - heap_get_latest_tid() appears broken even without this patch. It's only used on user-specified TIDs, either in a TID scan, or due to the use of currtid_byreloid() and currtid_byrelname(). It attempts find the latest version of the tuple referenced by the given TID by following the CTID links. Like HOT, it uses XMAX/XMIN matching to detect when the chain is broken. However, unlike HOT, update chains can in general span multiple blocks. It is not now nor has it ever been safe to assume that the next tuple in the chain can't be frozen before the previous one is vacuumed away. Andres came up with the best example: suppose the tuple to be frozen physically precedes its predecessor; then, an in-progress vacuum might reach the to-be-frozen tuple before it reaches (and removes) the previous row version. In newer releases, the same problem could be caused by vacuum's occasional page-skipping behavior. As with the previous point, the don't actually change xmin when we freeze approach actually makes it harder for a chain to get broken when it shouldn't, but I suspect it's just moving us from one set of extremely-obscure failure cases to
Re: [HACKERS] Move unused buffers to freelist
Instead, I suggest modifying BgBufferSync, specifically this part right here: else if (buffer_state BUF_REUSABLE) reusable_buffers++; What I would suggest is that if the BUF_REUSABLE flag is set here, use that as the trigger to do StrategyMoveBufferToFreeListEnd(). I think at this point also we need to lock buffer header to check refcount and usage_count before moving to freelist, or do you think it is not required? If BUF_REUSABLE is set, that means we just did exactly what you're saying. Why do it twice? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ASYNC Privileges proposal
On Mon, May 20, 2013 at 02:44:58AM +0100, Chris Farmiloe wrote: Hey all, I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. If the payloads of notifications could be trusted, then applications could make better use of them, without fear of leaking any sensitive information to anyone who shouldn't be able to see it. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ie: GRANT ASYNC ON DATABASE TO bob; REVOKE ASYNC ON DATABASE FROM bob; SECURITY DEFINER functions could then be used anywhere that a finer grained access control was required. I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc I question the usefulness of allowing listen/notify to be restricted to an entire class of users. The granularity of this seems too broad, though I am not sure if allowing message to be sent to a specific user is easily achievable. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
On Tue, May 28, 2013 at 7:36 AM, Stephen Frost sfr...@snowman.net wrote: On the other hand, I do feel like people are worried about over-extending a relation and wasting disk space- but with the way that vacuum can clean up pages at the end, that would only be a temporary situation anyway. Hi, Maybe i'm wrong but this should be easily solved by an autovacuum_no_truncate_empty_pages or an autovacuum_empty_pages_limit GUC/reloption. Just to clarify the second one autovacuum will allow until that limit of empty pages, and will remove excess from there We can also think in GUC/reloption for next_extend_blocks so formula is needed, or of course the automated calculation that has been proposed -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Running pgindent
On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote: Do we want to run pgindent soon? OK, should I run it this week? Wednesday, 1800 GMT? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MVCC catalog access
On Sun, May 26, 2013 at 9:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: Perhaps we see little difference in performance because PGPROC has been separated into PGPROC and PGXACT, reducing lock contention with getting snapshot data? By the way, I grabbed a 32-core machine and did some more performance tests with some open connections with XIDs assigned using pg_cxn v2 given by Robert in his previous mail to make sure that the snapshots get pretty large. Thanks for checking this on another machine. It's interesting that you were able to measure a hit for relcache rebuild, whereas I was not, but it doesn't look horrible. IMHO, we should press forward with this approach. Considering that these are pretty extreme test cases, I'm inclined to view the performance loss as acceptable. We've never really viewed DDL as something that needs to be micro-optimized, and there is ample testimony to that fact in the existing code and in the treatment of prior patches in this area. This is not to say that we want to go around willy-nilly making it slower, but I think there will be very few users for which the number of microseconds it takes to create or drop an SQL object is performance-critical, especially when you consider that (1) the effect will be quite a bit less when the objects are tables, since in that case the snapshot cost will tend to be drowned out by the filesystem cost and (2) people who don't habitually keep hundreds and hundreds of connections open - which hopefully most people don't - won't see the effect anyway. Against that, this removes the single largest barrier to allowing more concurrent DDL, a feature that I suspect will make a whole lot of people *very* happy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
On Tue, May 28, 2013 at 8:38 AM, Jaime Casanova ja...@2ndquadrant.com wrote: We can also think in GUC/reloption for next_extend_blocks so formula is needed, or of course the automated calculation that has been proposed s/so formula is needed/so *no* formula is needed btw, we can also use a next_extend_blocks GUC/reloption as a limit for autovacuum so it will allow that empty pages at the end of the table -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Running pgindent
On Tue, May 28, 2013 at 9:48 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian br...@momjian.us wrote: On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote: Do we want to run pgindent soon? OK, should I run it this week? Wednesday, 1800 GMT? wfm. +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Running pgindent
On Tue, May 28, 2013 at 09:49:32AM -0400, Magnus Hagander wrote: On Tue, May 28, 2013 at 9:48 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian br...@momjian.us wrote: On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote: Do we want to run pgindent soon? OK, should I run it this week? Wednesday, 1800 GMT? wfm. +1. OK, consider it scheduled, 2013-05-29, 1400 ET, 1800 GMT. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Running pgindent
On Tue, May 28, 2013 at 9:40 AM, Bruce Momjian br...@momjian.us wrote: On Wed, May 22, 2013 at 01:52:28PM -0400, Bruce Momjian wrote: Do we want to run pgindent soon? OK, should I run it this week? Wednesday, 1800 GMT? wfm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Sat, May 25, 2013 at 2:55 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. It's nice to be able to test things in the context of a running install, but sometimes a microbenchmark is just as good. I mean, if posix_fallocate() is faster, then it's just faster, right? It's likely to be pretty hard to get reproducible numbers for how much this actually helps in the real world because write tests are inherently pretty variable depending on a lot of factors we don't control, so even if Jon has got the best possible test, the numbers may bounce around so much that you can't really measure the (probably small) gain from this approach. But that doesn't seem like a reason not to adopt the approach and take whatever gain there is. At least, not that I can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] COPY .... (FORMAT binary) syntax doesn't work
On Mon, May 27, 2013 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 26 May 2013 17:10, Tom Lane t...@sss.pgh.pa.us wrote: More readable would be to invent an intermediate nonterminal falling between ColId and ColLabel, whose expansion would be IDENT | unreserved_keyword | col_name_keyword | type_func_name_keyword, and then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. Any thoughts about a name for that new nonterminal? Do you think complicating the parser in that way is worth the trouble for this case? Could that slow down parsing? It makes the grammar tables a bit larger (1% or so IIRC). There would be some distributed penalty for that, but probably not much. Of course there's always the slippery-slope argument about that. We don't actually have to fix it; clearly not too many people are bothered, since no complaints in 3 years. Documenting 'binary' seems better. Well, my thought is there are other cases. For instance: regression=# create role binary; ERROR: syntax error at or near binary LINE 1: create role binary; ^ regression=# create user cross; ERROR: syntax error at or near cross LINE 1: create user cross; ^ If we don't have to treat type_func_name_keywords as reserved in these situations, shouldn't we avoid doing so? I am almost always in favor of making more things less reserved, so +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
* Jaime Casanova (ja...@2ndquadrant.com) wrote: btw, we can also use a next_extend_blocks GUC/reloption as a limit for autovacuum so it will allow that empty pages at the end of the table I'm really not, at all, excited about adding in GUCs for this. We just need to realize when the only available space in the relation is at the end and people are writing to it and avoid truncating pages off the end- if we don't already have locks that prevent vacuum from doing this already. I'd want to see where it's actually happening before stressing over it terribly much. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] getting rid of freezing
On Sat, May 25, 2013 at 6:14 AM, Simon Riggs si...@2ndquadrant.com wrote: One thought I had is that it might be beneficial to freeze when a page ceases to be all-visible, rather than when it becomes all-visible. Any operation that makes the page not-all-visible is going to emit an FPI anyway, so we don't have to worry about torn pages in that case. Under such a scheme, we'd have to enforce the rule that xmin and xmax are ignored for any page that is all-visible; and when a page ceases to be all-visible, we have to go back and really freeze the pre-existing tuples. I think we might be able to use the existing all_visible_cleared/new_all_visible_cleared flags to trigger this behavior, without adding anything new to WAL at all. I like the idea but it would mean we'd have to freeze in the foreground path rather in a background path. That's true, but I think with this approach it would be really cheap. The overhead of setting a few bits in a page is very small compared to the overhead of emitting a WAL record. We'd have to test it, but I wouldn't be surprised to find the cost is too small to measure. Have we given up on the double buffering idea to remove FPIs completely? If we did that, then this wouldn't work. I don't see why those things are mutually exclusive. What is the relationship? Anyway, I take it the direction of this idea is that we don't need a separate freezemap, just use the vismap. That seems to be forcing ideas down a particular route we may regret. I'd rather just keep those things separate, even if we manage to merge the WAL actions for most of the time. Hmm. To me it seems highly desirable to merge those things, because they're basically the same thing. The earliest time at which we can freeze a tuple is when it's all-visible, and the only argument I've ever heard for waiting longer is to preserve the original xmin for forensic purposes, which I think we can do anyway. I have posted a patch for that on another thread. I don't like having two separate concepts where one will do; I think the fact that it is structured that way today is mostly an artifact of one setting being page-level and the other tuple-level, which is a thin excuse for so much complexity. I think the right way is actually to rethink and simplify all this complexity of Freezing/Pruning/Hinting/Visibility I agree, but I think that's likely to have to wait until we get a pluggable storage API, and then a few years beyond that for someone to develop the technology to enable the new and better way. In the meantime, if we can eliminate or even reduce the impact of freezing in the near term, I think that's worth doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-05-28 10:03:58 -0400, Robert Haas wrote: On Sat, May 25, 2013 at 2:55 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. It's nice to be able to test things in the context of a running install, but sometimes a microbenchmark is just as good. I mean, if posix_fallocate() is faster, then it's just faster, right? Well, it's a bit more complex than that. Fallocate doesn't actually initializes the disk space in most filesystems, just marks it as allocated and zeroed which is one of the reasons it can be noticeably faster. But that can make the runtime overhead of writing to those pages higher. I wonder whether noticeably upping checkpoint segments and then a) COPY in a large table b) a pgbench on a previously initialized table. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, May 28, 2013 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 10:03:58 -0400, Robert Haas wrote: On Sat, May 25, 2013 at 2:55 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. It's nice to be able to test things in the context of a running install, but sometimes a microbenchmark is just as good. I mean, if posix_fallocate() is faster, then it's just faster, right? Well, it's a bit more complex than that. Fallocate doesn't actually initializes the disk space in most filesystems, just marks it as allocated and zeroed which is one of the reasons it can be noticeably faster. But that can make the runtime overhead of writing to those pages higher. Maybe it would be good to measure that impact. Something like this: 1. Write 16MB of zeroes to an empty file in the same size chunks we're currently using (8kB?). Time that. Rewrite the file with real data. Time that. 2. posix_fallocate() an empty file out to 16MB. Time that. Rewrite the fie with real data. Time that. Personally, I have trouble believing that writing 16MB of zeroes by hand is better than telling the OS to do it for us. If that's so, the OS is just stupid, because it ought to be able to optimize the crap out of that compared to anything we can do. Of course, it is more than possible that the OS is in fact stupid. But I'd like to hope not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background worker and normal exit
On Sun, May 26, 2013 at 6:48 PM, Michael Paquier michael.paqu...@gmail.com wrote: Hmm so you can't have workers just doing something once and exit? I have to admit, i didn't follow bgworkers closely in the past, but could you give a short insight on why this is currently not possible? Bgworkers are expected to run all the time, and will be restarted each time they exit cleanly with a status code 0. Note that they are *still* restarted immediately even if bgw_restart_time is set at BGW_NEVER_RESTART or to a certain value. There are actually two ways you can use to have them perform a one-time task: - put it in indefinite sleep after the task is accomplished That's not really the same thing... - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with non-0 status code. That might be good enough, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: I was talking this over with Jeff on the plane, and we wanted to be clear on your goals here: are you looking to eliminate the *write* cost of freezing, or just the *read* cost of re-reading already frozen pages? Both. The latter is what I have seen causing more hurt, but the former alone is painful enough. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background worker and normal exit
On 2013-05-28 10:23:46 -0400, Robert Haas wrote: On Sun, May 26, 2013 at 6:48 PM, Michael Paquier - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with non-0 status code. That might be good enough, though. I suggested that to Fujii at pgcon, and it seems to work for him. But I think this sucks since you loose support for a restart upon a FATAL or similar error. And you cannot mark that as something non-fatal in the log. To this day I laugh about the following oddity in the xorg log: [30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR disabled message. [30.088] (--) RandR disabled I really don't want to go there. You actually can only return a 1 since everything else will tear down the whole cluster... We actually were discussing this recently: http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de I think a separate return code for exited gracefully, don't restart would be a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background worker and normal exit
On Tue, May 28, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 10:23:46 -0400, Robert Haas wrote: On Sun, May 26, 2013 at 6:48 PM, Michael Paquier - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with non-0 status code. That might be good enough, though. I suggested that to Fujii at pgcon, and it seems to work for him. But I think this sucks since you loose support for a restart upon a FATAL or similar error. And you cannot mark that as something non-fatal in the log. To this day I laugh about the following oddity in the xorg log: [30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR disabled message. [30.088] (--) RandR disabled I really don't want to go there. You actually can only return a 1 since everything else will tear down the whole cluster... We actually were discussing this recently: http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de I think a separate return code for exited gracefully, don't restart would be a good idea. Yeah. Or maybe the restart-timing/restart-when logic should just apply to the exit(0) case as well. Not sure what the downside of that would be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background worker and normal exit
On 2013-05-28 10:33:47 -0400, Robert Haas wrote: On Tue, May 28, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 10:23:46 -0400, Robert Haas wrote: On Sun, May 26, 2013 at 6:48 PM, Michael Paquier - set bgw_restart_time to BGW_NEVER_RESTART. and have the bgworler exit with non-0 status code. That might be good enough, though. I suggested that to Fujii at pgcon, and it seems to work for him. But I think this sucks since you loose support for a restart upon a FATAL or similar error. And you cannot mark that as something non-fatal in the log. To this day I laugh about the following oddity in the xorg log: [30.087] (II) RADEON(0): RandR 1.2 enabled, ignore the following RandR disabled message. [30.088] (--) RandR disabled I really don't want to go there. You actually can only return a 1 since everything else will tear down the whole cluster... We actually were discussing this recently: http://archives.postgresql.org/message-id/20130423134833.GD8499%40alap2.anarazel.de I think a separate return code for exited gracefully, don't restart would be a good idea. Yeah. Or maybe the restart-timing/restart-when logic should just apply to the exit(0) case as well. Not sure what the downside of that would be. Loosing the ability to restart a process where the reason for exiting are non-fatal and shouldn't be logged noisily or are already logged. I actually could use both capabilities. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Process memory architecture
On Mon, May 27, 2013 at 10:23 AM, Atri Sharma atri.j...@gmail.com wrote: We may still be able to do better than what we're doing today, but I'm still suspicious that you're going to run into other issues with having 500 indexes on a table anyway. +1. I am suspicious that the large number of indexes is the problem here,even if the problem is not with book keeping associated with those indexes. Right. The problem seems likely to be that each additional index requires a relcache entry, which uses some backend-local memory. But NOT having those backend-local relcache entries would likely be devastating for performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)
Today we have seen 2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10 EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied --- flushed only to 1E7E/21CB79A0,writing block 9 of relation base/16416/293974676 2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10 EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied --- flushed only to 1E7E/21CB79A0,writing block 9 of relation base/16416/293974676 while taking the *backup of the primary*. We have been running for a few days like that and today is the first day where we see these problems again. So it's not entirely deterministic / we don't know yet what we have to do to reproduce. So this makes Robert's theory more likely. However we have also using this method (LVM + rsync with hardlinks from primary) for years without these problems. So the big question is what changed? One hypothesis is that it is related to the primary being in hot_standby instead of minimal or archive wal_method (which we used before we switched to 9.2). Here are the entries in the log related to the startup of the corrupt testing cluster: 2013-05-27 22:41:10.029 EDT,,,30598,,51a41946.7786,1,,2013-05-27 22:41:10 EDT,,0,LOG,0,database system was interrupted; last known up at 2013-05-26 21:01:09 EDT, 2013-05-27 22:41:10.029 EDT,,,30599,,51a41946.7787,1,,2013-05-27 22:41:10 EDT,,0,LOG,0,connection received: host=172.27.65.204 port=55279, 2013-05-27 22:41:10.030 EDT,,,30598,,51a41946.7786,2,,2013-05-27 22:41:10 EDT,,0,LOG,0,database system was not properly shut down; automatic recovery in progress, 2013-05-27 22:41:10.030 EDT,as-elephant,postgres,30599, 172.27.65.204:55279,51a41946.7787,2,,2013-05-27 22:41:10 EDT,,0,FATAL,57P03,the database system is starting up, 2013-05-27 22:41:10.031 EDT,,,30598,,51a41946.7786,3,,2013-05-27 22:41:10 EDT,,0,LOG,0,redo starts at 1E7E/2152B178, 2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,4,,2013-05-27 22:41:10 EDT,,0,LOG,0,record with zero length at 1E7E/215AC6B8, 2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,5,,2013-05-27 22:41:10 EDT,,0,LOG,0,redo done at 1E7E/215AC688, 2013-05-27 22:41:10.094 EDT,,,30598,,51a41946.7786,6,,2013-05-27 22:41:10 EDT,,0,LOG,0,last completed transaction was at log time 2013-05-26 21:09:08.06351-04, 2013-05-27 22:41:10.134 EDT,,,30595,,51a41945.7783,1,,2013-05-27 22:41:09 EDT,,0,LOG,0,database system is ready to accept connections, 2013-05-27 22:41:10.134 EDT,,,30603,,51a41946.778b,1,,2013-05-27 22:41:10 EDT,,0,LOG,0,autovacuum launcher started, 2013-05-27 22:41:15.037 EDT,,,30608,,51a4194b.7790,1,,2013-05-27 22:41:15 EDT,,0,LOG,0,connection received: host=172.27.65.204 port=55283, This means we currently do NOT have a way to make backups that we trust. We are very open to any suggestions of any alternative methods we should consider using. The database is of non trivial size by now: proddb= select pg_size_pretty(pg_database_size('proddb')); pg_size_pretty 1294 GB (1 row) The backup script itself is by now a rather long OCaml program, so I doubt the value in posting it to this list. But here is the log of what it did which should be pretty explanatory: proddb backup: starting proddb backup: /bin/bash -c /usr/bin/ssh -l root tot-dbc-001 ls -d /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*' (enqueued) proddb backup: /bin/bash -c /usr/bin/ssh -l root tot-dbc-001 ls -d /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*' (running as pid: [23422]) proddb backup: /bin/bash -c /usr/bin/ssh -l root tot-dbc-001 ls -d /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb'.*-*-*.*-*-*' ([23422] exited normally) proddb backup: /bin/mkdir -p /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress (enqueued) proddb backup: /bin/mkdir -p /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress (running as pid: [23433]) proddb backup: /bin/mkdir -p /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress ([23433] exited normally) proddb backup: /bin/bash -c /usr/bin/ssh -l root tot-dbc-001 /mnt/global/base/bin/db tools backup backup -v -src '((dbname proddb) (hostname tot-dbc-001) (superuser postgres_prod) (basedir /database) (version 9.2))' -dst '((username postgres) (hostname 127.0.0.1) (backup_dir /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress) (last_backup_dir (..//proddb.2013-05-26.20-00-00) proddb backup: /bin/bash -c /usr/bin/ssh -l root tot-dbc-001 /mnt/global/base/bin/db tools backup backup -v -src '((dbname proddb) (hostname tot-dbc-001) (superuser postgres_prod) (basedir /database) (version 9.2))' -dst '((username postgres) (hostname 127.0.0.1) (backup_dir /net/nyc-isilon1/ifs/data/backup-dbc//proddb/proddb.in-progress) (last_backup_dir (..//proddb.2013-05-26.20-00-00) proddb backup [23437]: 2013-05-27
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Sat, May 25, 2013 at 11:27 AM, Merlin Moncure mmonc...@gmail.com wrote: On Sat, May 25, 2013 at 4:39 AM, Simon Riggs si...@2ndquadrant.com wrote: There are a number of changes we'd probably like to make to the way things work in Postgres. This thread is not about discussing what those are, just to say that requirements exist and have been discussed in various threads over time. The constraint on such changes is that we've decided that we must have an upgrade path from release to release. So I'd like to make a formal suggestion of a plan for how we cope with this: 1. Implement online upgrade in 9.4 via the various facilities we have in-progress. That looks completely possible. 2. Name the next release after that 10.0 (would have been 9.5). We declare now that a) 10.0 will support on-line upgrade from 9.4 (only) b) various major incompatibilities will be introduced in 10.0 - the change in release number will indicate to everybody that is the case c) agree that there will be no pg_upgrade patch from 9.4 to 10.0, so that we will not be constrained by that This plan doesn't presume any particular change. Each change would need to be discussed on a separate thread, with a separate case for each. All I'm suggesting is that we have a coherent plan for the timing of such changes, so we can bundle them together into one release. By doing this now we give ourselves lots of time to plan changes that will see us good for another decade. If we don't do this, then we simply risk losing the iniative by continuing to support legacy formats and approaches. Huh. I don't think that bumping the version number to 10.0 vs 9.5 is justification to introduce breaking changes. In fact, I would rather see 10.0 be the version where we formally stop doing that. I understand that some stuff needs to be improved but it often doesn't seem to be worth the cost in the long run. Please disregard this comment -- I didn't realize the topic was regarding on disk format -- I mistakenly though it was opening the door for user level feature changes. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On Tue, May 28, 2013 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 28, 2013 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 10:03:58 -0400, Robert Haas wrote: On Sat, May 25, 2013 at 2:55 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. It's nice to be able to test things in the context of a running install, but sometimes a microbenchmark is just as good. I mean, if posix_fallocate() is faster, then it's just faster, right? Well, it's a bit more complex than that. Fallocate doesn't actually initializes the disk space in most filesystems, just marks it as allocated and zeroed which is one of the reasons it can be noticeably faster. But that can make the runtime overhead of writing to those pages higher. Maybe it would be good to measure that impact. Something like this: 1. Write 16MB of zeroes to an empty file in the same size chunks we're currently using (8kB?). Time that. Rewrite the file with real data. Time that. 2. posix_fallocate() an empty file out to 16MB. Time that. Rewrite the fie with real data. Time that. Personally, I have trouble believing that writing 16MB of zeroes by hand is better than telling the OS to do it for us. If that's so, the OS is just stupid, because it ought to be able to optimize the crap out of that compared to anything we can do. Of course, it is more than possible that the OS is in fact stupid. But I'd like to hope not. I wrote a little C program to do something very similar to that (which I'll hope to post later today). It opens a new file, fallocates 16MB, calls fdatasync. Then it loops 10 times: seek to the start of the file, writes 16MB of ones, calls fdatasync. Then it closes and removes the file, re-opens it, and this time writes out 16MB of zeroes, calls fdatasync, and then does the same loop as above. The program times the process from file open to file unlink, inclusive. The results - for me - are pretty consistent: using fallocate is 12-13% quicker than writing out zeroes. I used fdatasync twice to (attempt) to mimic what the WAL writer does. -- Jon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Process memory architecture
On Mon, May 27, 2013 at 7:29 AM, Stephen Frost sfr...@snowman.net wrote: * Atri Sharma (atri.j...@gmail.com) wrote: Yes, too many indexes wont hurt much.BTW,wont making too many indexes on columns that probably dont have as many values as to deserve them(so,essentially,indiscriminately making indexes) hurt the performance/memory usage? I'd expect the performance issue would be from planner time more than memory usage- but if there is a serious memory usage issue here, then it'd be valuable to have a test case showing what's happening. We may not be releasing the sys cache in some cases or otherwise have a bug in this area. Note, backends do use private memory to cache various things (relcache, etc). Absolutely pathological workloads (tons of tables, tons of (especially) views, etc can exercise this into the gigabytes and there is no effective way to monitor and control it. Normally, it's not a very big deal though. So, to be a bit more specific, the index *data* (like all on disk structures) is buffered in shared memory. But certain plans/metadata stuff is in private memory. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 2013-05-28 10:12:05 -0500, Jon Nelson wrote: On Tue, May 28, 2013 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 28, 2013 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 10:03:58 -0400, Robert Haas wrote: On Sat, May 25, 2013 at 2:55 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: The biggest thing missing from this submission is information about what performance testing you did. Ideally performance patches are submitted with enough information for a reviewer to duplicate the same test the author did, as well as hard before/after performance numbers from your test system. It often turns tricky to duplicate a performance gain, and being able to run the same test used for initial development eliminates a lot of the problems. This has been a bit of a struggle. While it's true that WAL file creation doesn't happen with great frequency, and while it's also true that - with strace and other tests - it can be proven that fallocate(16MB) is much quicker than writing it zeroes by hand, proving that in the larger context of a running install has been challenging. It's nice to be able to test things in the context of a running install, but sometimes a microbenchmark is just as good. I mean, if posix_fallocate() is faster, then it's just faster, right? Well, it's a bit more complex than that. Fallocate doesn't actually initializes the disk space in most filesystems, just marks it as allocated and zeroed which is one of the reasons it can be noticeably faster. But that can make the runtime overhead of writing to those pages higher. Maybe it would be good to measure that impact. Something like this: 1. Write 16MB of zeroes to an empty file in the same size chunks we're currently using (8kB?). Time that. Rewrite the file with real data. Time that. 2. posix_fallocate() an empty file out to 16MB. Time that. Rewrite the fie with real data. Time that. Personally, I have trouble believing that writing 16MB of zeroes by hand is better than telling the OS to do it for us. If that's so, the OS is just stupid, because it ought to be able to optimize the crap out of that compared to anything we can do. Of course, it is more than possible that the OS is in fact stupid. But I'd like to hope not. I wrote a little C program to do something very similar to that (which I'll hope to post later today). It opens a new file, fallocates 16MB, calls fdatasync. Then it loops 10 times: seek to the start of the file, writes 16MB of ones, calls fdatasync. You need to call fsync() not fdatasync() the first time round. fdatasync doesn't guarantee metadata is synced. Then it closes and removes the file, re-opens it, and this time writes out 16MB of zeroes, calls fdatasync, and then does the same loop as above. The program times the process from file open to file unlink, inclusive. The results - for me - are pretty consistent: using fallocate is 12-13% quicker than writing out zeroes. Cool! I used fdatasync twice to (attempt) to mimic what the WAL writer does. Not sure what you mean by that though? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)
On 5/28/13 11:12 AM, Jon Nelson wrote: It opens a new file, fallocates 16MB, calls fdatasync. Outside of the run for performance testing, I think it would be good at this point to validate that there is really a 16MB file full of zeroes resulting from these operations. I am not really concerned that posix_fallocate might be slower in some cases; that seems unlikely. I am concerned that it might result in a file that isn't structurally the same as the 16MB of zero writes implementation used now. The timing program you're writing has some aspects that are similar to the contrib/pg_test_fsync program. You might borrow some code from there usefully. To clarify the suggestion I was making before about including performance test results: that doesn't necessarily mean the testing code must run using only the database. That's better if possible, but as Robert says it may not be for some optimizations. The important thing is to have something measuring the improvement that a reviewer can duplicate, and if that's a standalone benchmark problem that's still very useful. The main thing I'm wary of is any this should be faster claims that don't come with any repeatable measurements at all. Very often theories about the fastest way to do something don't match what's actually seen in testing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On 05/28/2013 06:13 AM, Joshua D. Drake wrote: On 05/27/2013 06:53 PM, Craig Ringer wrote: On 05/28/2013 09:39 AM, Gavin Flower wrote: Yes, I hate the Firefox style number inflation. I was arguing *for* it ;-) I don't like it much either, but (a) we do about one release a year, not one every few weeks and (b) it's very clear from a quick look at Stack Overflow or first-posts to pgsql-general how confusing two-part major versions are to users. If it's a bit less aesthetically pleasing I'm OK with that. This argument comes up every couple of years and the people that are trying to solve the problem by changing the versioning are ignoring the fact that there is no problem to solve. Consider the following exchange: Client: I have X problem with PostgreSQL CMD: What version? Client: 9 CMD: Which version of 9? Client: 9.0.2 CMD: You should be running 9.2.4 or at least 9.0.13 If the problem has the at least part, then the first part is superfluous. If somebody wants to figure out how to run streaming CTE-s on postgresql 8 then you need to ask for exact major version which is two first digits if they want to run streaming replication there you can skip on e-mail exchange and tell right away that SR was added in version 9.0 ... The conversation does not change. Further, we are not Firefox. We are not user software. We are developer software. At least some of the real-world problems with PostgreSQL comes from We are developer software mentality. Yes, We are developer software, but we are also a DBA/maintainer/infrastructure manager software which needs to live a long time after the development is finished. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] potential bug in JSON
I've found a potential bug. Why the - operator returns JSON instead of TEXT? It doesn't make sens for me, and the documentation doesn't inform about that. postgres=# SELECT ('{id: 1}'::json - 'id')::int; ERROR: cannot cast type json to integer LINE 1: SELECT ('{id: 1}'::json - 'id')::int; postgres=# SELECT ('{id: 1}'::json - 'id')::text::int; int4 -- 1 (1 row) postgres=# SELECT version(); version --- PostgreSQL 9.3beta1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3, 64-bit (1 row)
Re: [HACKERS] Extent Locks
On Tue, May 28, 2013 at 9:07 AM, Stephen Frost sfr...@snowman.net wrote: * Jaime Casanova (ja...@2ndquadrant.com) wrote: btw, we can also use a next_extend_blocks GUC/reloption as a limit for autovacuum so it will allow that empty pages at the end of the table I'm really not, at all, excited about adding in GUCs for this. We just need to realize when the only available space in the relation is at the end and people are writing to it and avoid truncating pages off the end- if we don't already have locks that prevent vacuum from doing this already. I'd want to see where it's actually happening before stressing over it terribly much. +1 autovacuum configuration is already much too complex as it is...we should be removing/consolidating options, not adding them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] potential bug in JSON
On 05/28/2013 11:38 AM, Szymon Guz wrote: I've found a potential bug. Why the - operator returns JSON instead of TEXT? It doesn't make sens for me, and the documentation doesn't inform about that. postgres=# SELECT ('{id: 1}'::json - 'id')::int; ERROR: cannot cast type json to integer LINE 1: SELECT ('{id: 1}'::json - 'id')::int; postgres=# SELECT ('{id: 1}'::json - 'id')::text::int; int4 -- 1 (1 row) This is not a bug. It is documented and by design. If you want text, use the - operator. That's exactly what it's for. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
On 2013-05-28 10:07:06 -0400, Stephen Frost wrote: * Jaime Casanova (ja...@2ndquadrant.com) wrote: btw, we can also use a next_extend_blocks GUC/reloption as a limit for autovacuum so it will allow that empty pages at the end of the table I'm really not, at all, excited about adding in GUCs for this. But I thought you were in favor of doing complex stuff like mapping segments filled somewhere else into place :P But I agree. This needs to work without much manual intervention. I think we just need to make autovacuum truncate only if it finds more free space than whatever amount we might have added at that relation size plus some slop. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] potential bug in JSON
On 05/28/2013 08:38 AM, Szymon Guz wrote: I've found a potential bug. Why the - operator returns JSON instead of TEXT? It doesn't make sens for me, and the documentation doesn't inform about that. Yes, it most certainly does: http://www.postgresql.org/docs/9.3/static/functions-json.html If you want to get text, use the - operator. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
This argument comes up every couple of years and the people that are trying to solve the problem by changing the versioning are ignoring the fact that there is no problem to solve. We just had this discussion on -advocacy (where it belongs, frankly) a couple months ago: http://www.postgresql.org/message-id/512e8ef8.3000...@agliodbs.com To sum up: the negatives of changing our version numbering scheme outweighed the positives. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
* Andres Freund (and...@2ndquadrant.com) wrote: On 2013-05-28 10:07:06 -0400, Stephen Frost wrote: I'm really not, at all, excited about adding in GUCs for this. But I thought you were in favor of doing complex stuff like mapping segments filled somewhere else into place :P That wouldn't require a GUC.. ;) But I agree. This needs to work without much manual intervention. I think we just need to make autovacuum truncate only if it finds more free space than whatever amount we might have added at that relation size plus some slop. Agreed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Tue, May 28, 2013 at 11:56 AM, Josh Berkus j...@agliodbs.com wrote: This argument comes up every couple of years and the people that are trying to solve the problem by changing the versioning are ignoring the fact that there is no problem to solve. We just had this discussion on -advocacy (where it belongs, frankly) +1. a couple months ago: http://www.postgresql.org/message-id/512e8ef8.3000...@agliodbs.com To sum up: the negatives of changing our version numbering scheme outweighed the positives. And +1 to that, too. FWIW, I think we may want to consider retitling 9.4 as 10.0, not because of any binary compatibility break (which, for the record, I oppose) but because of features. It's a little early to make that call just yet, of course, but I have a good feeling about this cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] potential bug in JSON
On 28 May 2013 17:53, Josh Berkus j...@agliodbs.com wrote: On 05/28/2013 08:38 AM, Szymon Guz wrote: I've found a potential bug. Why the - operator returns JSON instead of TEXT? It doesn't make sens for me, and the documentation doesn't inform about that. Yes, it most certainly does: http://www.postgresql.org/docs/9.3/static/functions-json.html If you want to get text, use the - operator. Yea, I noticed that. It was a little bit misleading for me that - is for getting field and - is for getting field as text. Especially when -::TEXT doesn't return the same value as -. Maybe there should be added as JSON to those operators which don't return text? Szymon
Re: [HACKERS] Extent Locks
On Tue, May 28, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: But I agree. This needs to work without much manual intervention. I think we just need to make autovacuum truncate only if it finds more free space than whatever amount we might have added at that relation size plus some slop. And how do you decide the amount of that slop? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extent Locks
* Jaime Casanova (ja...@2ndquadrant.com) wrote: On Tue, May 28, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.com wrote: But I agree. This needs to work without much manual intervention. I think we just need to make autovacuum truncate only if it finds more free space than whatever amount we might have added at that relation size plus some slop. And how do you decide the amount of that slop? How about % of table size? Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] GRANT role_name TO role_name ON database_name
I'd really love the ability to grant a *user* role-based privileges database by database. For background, I have several databases running in a single cluster, one database per business unit. Each database has the same core schema with the same basic role permissions, but with significant customizations. Even if it were technically possible to make them a single database, it would be unwise for administrative reasons. Each user may have access to any number of databases, but, within each database may be assigned to different roles. For example, we may have an 'auditor' role which gives specific access to some trigger-maintained change history. But, a given user may only be an auditor for the business units they are assigned. That said, they may have other roles in other business units. My requirements are very fluid here and dictated by regulatory requirements. Currently, we work around the lack of per-database role permissions by prefixing roles with the name of the database. This is quite tedious though, it requires specialized logic to overlay creation, backups, restores, updating and deleting databases. It's very irritating, requires custom code and conventions, even though it works. About 5 years ago, I think I asked for roles to become database specific. I know think that is a bit draconian given the cluster-wide permission structure used by PostgreSQL. However, perhaps a way to make it optionally limited to a given database would simplify my permission tracking? Best, Clark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On 05/28/2013 08:36 AM, Hannu Krosing wrote: The conversation does not change. Further, we are not Firefox. We are not user software. We are developer software. At least some of the real-world problems with PostgreSQL comes from We are developer software mentality. Yes, We are developer software, but we are also a DBA/maintainer/infrastructure manager I would not hire any of those three that weren't smart enough to understand our versioning scheme or had the wits to open a web browser and google: PostgreSQL versioning The answer is link #1 on Google. That said, I won't raise a stink. I am not really of a strong opinion either way except to say we are not solving a problem. We are just tickling each other's fancies. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogInsert scaling, revisited
Heikki Linnakangas wrote: I've been slowly continuing to work that I started last winder to make XLogInsert scale better. I have tried quite a few different approaches since then, and have settled on the attached. This is similar but not exactly the same as what I did in the patches I posted earlier. Did this go anywhere? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] all_visible replay aborting due to uninitialized pages
Hi, A customer of ours reporting a standby loosing sync with the primary due to the following error: CONTEXT: xlog redo visible: rel 1663/XXX/XXX; blk 173717 WARNING: page 173717 of relation base/XXX/XXX is uninitialized ... PANIC: WAL contains references to invalid pages Guessing around I looked and noticed the following problematic pattern: 1) A: wants to do an update, doesn't have enough freespace 2) A: extends the relation on the filesystem level (RelationGetBufferForTuple) 3) A: does PageInit (RelationGetBufferForTuple) 4) A: aborts, e.g. due to a serialization failure (heap_update) At this point the page is initialized in memory, but not wal logged. It isn't pinned or locked either. 5) B: vacuum finds that page and it's empty. So it marks it all visible. But since the page wasn't written out (we haven't even marked it dirty in 3.) the standby doesn't know that and reports the page as being uninitialized. ISTM the best backbranchable fix for this is to teach lazy_scan_heap to log an FPI for the heap page via visibilitymap_set in that rather limited case. Happy to provide a patch unless somebody has a better idea? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Thu, May 23, 2013 at 01:48:24PM -0400, Heikki Linnakangas wrote: On 23.05.2013 08:03, Simon Riggs wrote: On 23 May 2013 12:10, Heikki Linnakangashlinnakan...@vmware.com wrote: Please take a look: https://github.com/vmware/pg_rewind The COPYRIGHT file shows that VMware is claiming copyright on unstated parts of the code for this. As such, its not a normal submission to the PostgreSQL project, which involves placing copyright with the PGDG. We have a lot of code in PostgreSQL source tree with different copyright notices, and there's no problem with that as long as the coe is licensed under the PostgreSQL license. For patches that add Really? Where? I think we have removed them all, as far as I know. A quick grep shows: $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents' ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994 ./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994 ./src/port/getopt_long.c: * Portions Copyright (c) 2003 Can someone comment on the Philip Warner item? Would someone contact him to clarify we can remove the mention? CC'ing him. or modify code in PostgreSQL, we generally have copyright notices with just PGDG, to avoid having a long list of copyright notices of a lot of companies and individuals on every file. I'm no lawyer, but I believe there's no difference from the legal point of view. Probably, but some mentions can cause concern when our code is reviewed by companies, so simplicity is good. As a result, while it sounds interesting, people should be aware of that and I suggest we shouldn't discuss that code on this list, to avoid any disputes should we decide to include a similar facility in core Postgres in the future. That's just paranoia. There are a lot of tools out there on pgfoundry, with various copyright holders and even difference licenses, and it's fine to talk about all those on this list. Besides, the code is licensed under the PostgreSQL license, so if someone decides we should have this e.g in contrib, you can just grab the sources and commit. Thirdly, there's no reason to refrain from even discussing this, even if someone would include a similar facility in core Postgres - this is about copyrights, not patents (and yes, this contribution has been cleared by VMware legal department; VMware doesn't hold any patents on this) I think Simon has a good point, as VMWare has asserted patents on some changes to their version of Postgres in the past, so if the copyright mentions VMWare, we can't assume it is patent-free. Just the fact you had to check with the VMware legal department verifies there is cause for concern about things coming from VMWare. In fact, I am curious what level of contribution requires a legal check, but I am not sure you can even share that information. Anyway, I would love to think we don't need to worry about this, but I think we do --- not in this case, but in general. I acknowledge that VMWare has been disciplined in share only patent-free information, at the community's request. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote: We have a lot of code in PostgreSQL source tree with different copyright notices, and there's no problem with that as long as the coe is licensed under the PostgreSQL license. For patches that add Really? Where? I think we have removed them all, as far as I know. A quick grep shows: $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents' ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994 ./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994 ./src/port/getopt_long.c: * Portions Copyright (c) 2003 Just remove the Portions part from your grep, and you will see quite some more... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote: On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote: We have a lot of code in PostgreSQL source tree with different copyright notices, and there's no problem with that as long as the coe is licensed under the PostgreSQL license. For patches that add Really? Where? I think we have removed them all, as far as I know. A quick grep shows: $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents' ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994 ./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994 ./src/port/getopt_long.c: * Portions Copyright (c) 2003 Just remove the Portions part from your grep, and you will see quite some more... Oh, I see. Have we historically been OK with these as long as it is clear it is the PG copyright? I know we had do some cleanups in the past, but I don't remember the details, obviously. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On 2013-05-28 14:50:57 -0400, Bruce Momjian wrote: On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote: On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote: We have a lot of code in PostgreSQL source tree with different copyright notices, and there's no problem with that as long as the coe is licensed under the PostgreSQL license. For patches that add Really? Where? I think we have removed them all, as far as I know. A quick grep shows: $ grep -r 'Portions Copyright'|egrep -v 'Global|Regents' ./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer ./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994 ./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner ./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994 ./src/port/getopt_long.c: * Portions Copyright (c) 2003 Just remove the Portions part from your grep, and you will see quite some more... Oh, I see. Have we historically been OK with these as long as it is clear it is the PG copyright? I know we had do some cleanups in the past, but I don't remember the details, obviously. I don't see a problem with a different copyrights as long as the licenses are compatible. I remember code getting (re-)moved because it was GPL, which is a different thing to having a different copyright. I don't have a all that wide look over the history though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
Bruce Momjian wrote: Oh, I see. Have we historically been OK with these as long as it is clear it is the PG copyright? I know we had do some cleanups in the past, but I don't remember the details, obviously. We've had request from companies because they wanted to distribute Postgres and lawyers weren't comfortable with copyright statements in assorted files. In those cases we've asked the people mentioned in such copyright statements, got approval to remove the offending copyright lines, and removed them. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On May 28, 2013, at 12:49 PM, Alvaro Herrera wrote: We've had request from companies because they wanted to distribute Postgres and lawyers weren't comfortable with copyright statements in assorted files. In those cases we've asked the people mentioned in such copyright statements, got approval to remove the offending copyright lines, and removed them. I assume this topic has come up and been rejected for some reason, but just in case: The Django project requires an explicit agreement for contributions that end up in the main source tree for it, part of which is the acceptance of the Django license and copyright notice. (I don't have my copy right in front of me, but I don't think it's a full-on assignment of copyright.) -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind, a tool for resynchronizing an old master after failover
On Tue, May 28, 2013 at 03:49:14PM -0400, Alvaro Herrera wrote: Bruce Momjian wrote: Oh, I see. Have we historically been OK with these as long as it is clear it is the PG copyright? I know we had do some cleanups in the past, but I don't remember the details, obviously. We've had request from companies because they wanted to distribute Postgres and lawyers weren't comfortable with copyright statements in assorted files. In those cases we've asked the people mentioned in such copyright statements, got approval to remove the offending copyright lines, and removed them. OK, so it was different _licenses_ that was the problem. OK. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
Hi, I've got a patch. This is for a plpython enhancement. There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages Fix loss of information during conversion of numeric type to Python float This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float. Patch contains changes in code, documentation and tests. Most probably there is something wrong, as this is my first Postgres patch :) thanks, Szymon plpython_decimal.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: Either we'll increase budged for storage, or we need to touch every bit of the system. Did you try 'oid' as an unsigned int4? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Tue, May 28, 2013 at 07:58:33AM +0800, Craig Ringer wrote: On 05/28/2013 12:41 AM, Simon Riggs wrote: I'm happy with that. I was also thinking about collecting changes not related just to disk format, if any exist. Any wire protocol or syntax changes? I can't seem to find a things we want to do in wire protocol v4 doc in the wiki but I know I've seen occasional discussion of things that can't be done without protocol changes. Anyone with a better memory than me able to pitch in? Sure, it is on the TODO list: https://wiki.postgresql.org/wiki/Todo#.2Fcontrib.2Fpg_upgrade I can only get a link to pg_upgrade from there, so look two sections below that for Wire Protocol Changes. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Mon, May 27, 2013 at 05:21:16PM -0700, Joshua D. Drake wrote: On 05/27/2013 04:58 PM, Craig Ringer wrote: On 05/28/2013 12:41 AM, Simon Riggs wrote: I'm happy with that. I was also thinking about collecting changes not related just to disk format, if any exist. Any wire protocol or syntax changes? I can't seem to find a things we want to do in wire protocol v4 doc in the wiki but I know I've seen occasional discussion of things that can't be done without protocol changes. Anyone with a better memory than me able to pitch in? What'd be required to support in-band query cancellation? Sending per-statement GUCs (to allow true statement timeout)? I would like to see the ability to define if a query is read only at the protocol level, so that load balances that speak libpq can know what to do with the query without parsing it. Sounds nice, but how would we do that? That would require libpq to know it, right? Do we pass anything back after parsing but before execution? Could it be optional? What about functions that modify the database --- isn't that only known at execution time? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Mon, May 27, 2013 at 02:09:05PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, May 27, 2013 at 09:17:50AM -0400, Bruce Momjian wrote: Yes, we should be collecting things we want to do for a pg_upgrade break so we can see the list all in one place. OK, I have added a section to the TODO list for this: Desired changes that would prevent upgrades with pg_upgrade 32-bit page checksums Are there any others? GiST indexes really oughta have a metapage so there can be a version number in them. Also, if we are going to unify hstore and json, it'd be nice if we could change the existing binary representation of hstore (per discussions at Oleg and Teodor's talk --- this will be moot if we invent a new core type, but it'd be better not to have to). There are probably some other data-type-specific cleanups we could make, but I have to go get on an airplane so no time to think about it. OK, GiST and hstore added to TODO list. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On 05/28/2013 02:18 PM, Bruce Momjian wrote: I would like to see the ability to define if a query is read only at the protocol level, so that load balances that speak libpq can know what to do with the query without parsing it. Sounds nice, but how would we do that? That would require libpq to know it, right? Do we pass anything back after parsing but before execution? Could it be optional? What about functions that modify the database --- isn't that only known at execution time? I can't speak to the actual C code that would be required but from a user space, I could see something like this: con = psycopg2.connect(database='testdb', user='test', transaction-type='r') Thus when the connection is made, before anything else is done, we know it is a read only connection and therefore any load balancer speaking libpq would also know it is a read only. The default of course would be r/w and you would use a different connection handler for r/w or w queries. The other option would be to do it on query execute but that doesn't seem as efficient as it would have to be parsed each time. Although it would still be better than reading the actual SQL. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FIX: auto_explain docs
Hi, I've just noticed that this patch in 2012-01 commitfest https://commitfest.postgresql.org/action/patch_view?id=729 added log_timing option to auto_explain, but it never actually made it to the docs. Attached is a patch for current master, but 9.2 should be patched too. regards Tomas diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml index 8325f03..03b2309 100644 --- a/doc/src/sgml/auto-explain.sgml +++ b/doc/src/sgml/auto-explain.sgml @@ -141,6 +141,27 @@ LOAD 'auto_explain'; varlistentry term + varnameauto_explain.log_timing/varname (typeboolean/type) +/term +indexterm + primaryvarnameauto_explain.log_timing/ configuration parameter/primary +/indexterm +listitem + para + varnameauto_explain.log_timing/varname causes commandEXPLAIN + (ANALYZE, TIMING off)/ output, rather than just commandEXPLAIN (ANALYZE)/ + output. The overhead of repeatedly reading the system clock can slow down the + query significantly on some systems, so it may be useful to set this + parameter to literalFALSE/literal when only actual row counts, and not + exact times, are needed. + This parameter is only effective when varnameauto_explain.log_analyze/varname + is also enabled. It defaults to literalTRUE/literal. + /para +/listitem + /varlistentry + + varlistentry +term varnameauto_explain.log_nested_statements/varname (typeboolean/type) /term indexterm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
Bruce Momjian wrote: On Mon, May 27, 2013 at 05:21:16PM -0700, Joshua D. Drake wrote: I would like to see the ability to define if a query is read only at the protocol level, so that load balances that speak libpq can know what to do with the query without parsing it. Sounds nice, but how would we do that? That would require libpq to know it, right? Do we pass anything back after parsing but before execution? Could it be optional? What about functions that modify the database --- isn't that only known at execution time? Well, if you hit anything that tries to acquire an Xid, and you're in a context that said only read-only was acceptable, just raise an error. In a similar vein, I vaguely recall we discussed (after some security vulnerability involving SQL injection) a mode where we only accept only one command per PQexec() call, i.e. reject execution of commands that contain multiple queries. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump with postgis extension dumps rules separately
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/06/2013 04:49 PM, Joe Conway wrote: If I create a database and install postgis as an extension, and then run pg_dump I get this: [...] CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public; [...] CREATE RULE geometry_columns_delete AS ON DELETE TO geometry_columns DO INSTEAD NOTHING; [...] Shouldn't that CREATE RULE be implicitly part of the CREATE EXTENSION? If so, is this a pg_dump bug, PostGIS bug, or pilot error? FWIW I see CREATE OR REPLACE RULE statements in the PostGIS extension SQL script. The attached one-liner seems to do the trick. It should probably be backpatched to 9.1. Remaining questions: 1) Are there other database object types, likely to be included in extension scripts, that are also lacking dependency records to their extension? 2) How should we handle already installed extensions, which will still lack dependency records after this bugfix? Thanks, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRpSi4AAoJEDfy90M199hlzn4P/j2tgs35b2Y3YoMJHIRDUYmK uihsKybUYN1uYlS58Igv04lhqWk4MMFFzfwvztENP2SzVysMkA7QoP0BIKy/lF+b CWwouTLkygnU/a9Mj8TTXMc4YINp4zHOK/XKZaong6zIwCGIXtXp9acl6m7wDI1v S2FkeRB2dJXyC/Vxv0n9p5JfW75KG6DadJa4ZlcsBx7yV1cwnmePLhoDvsX5fPro BlD4pFV+GgyW8d65kZxuzIQ/Wy44o0f97yDdeZKi4mzEYooakWzl5iZN5idEBQ3i LDgjwrCPvod0t8sYGSMaz9qc/fPpWAt4sPkwC6QOCE0u7PJnbZ0oGEGb0JBFGPBc nV/1sib9KXRfALEUknKYALBqnFhZsaGOTFV9yKhtvscqn/Hmk0mXyocVB9rihcO6 7ipgOgpeqFsS7IQMtiFBUIFPl2ARtD01NKIHbDIKFTQPfss6XXTgIBYmT8W0ldaT f2jxCEN5SzdCq/G3rx5Z2Dlqau3WIfYiSmWyAG/I2UDBtr7/J7TOSKoJh1+3ntvT Vxc9b+z8dEz3wE143JOhi1aCNCQ7ybI/K44EhkLjSR4hC6CQiCKlI4OP5gaFj8FJ YhxTe4FscYTYZVVguBTOKxMzrI1caIt+3LEJ3C7GTkTrQnYc/oZL4v86XlbV24ro V8IUaO0XFeam7oDxYOZw =d/qa -END PGP SIGNATURE- diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c index cb59f13..c48c661 100644 *** a/src/backend/rewrite/rewriteDefine.c --- b/src/backend/rewrite/rewriteDefine.c *** InsertRule(char *rulname, *** 181,186 --- 181,189 DEPENDENCY_NORMAL); } + /* dependency on extension */ + recordDependencyOnCurrentExtension(myself, is_update); + /* Post creation hook for new rule */ InvokeObjectPostCreateHook(RewriteRelationId, rewriteObjectId, 0); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Tue, May 28, 2013 at 02:26:06PM -0700, Joshua D. Drake wrote: Sounds nice, but how would we do that? That would require libpq to know it, right? Do we pass anything back after parsing but before execution? Could it be optional? What about functions that modify the database --- isn't that only known at execution time? I can't speak to the actual C code that would be required but from a user space, I could see something like this: con = psycopg2.connect(database='testdb', user='test', transaction-type='r') Thus when the connection is made, before anything else is done, we know it is a read only connection and therefore any load balancer speaking libpq would also know it is a read only. The default of course would be r/w and you would use a different connection handler for r/w or w queries. The other option would be to do it on query execute but that doesn't seem as efficient as it would have to be parsed each time. Although it would still be better than reading the actual SQL. Well, you could do SET TRANSACTION READ ONLY, and that would prevent any write transactions. You could assume it is a read query, and get the error and resubmit on the master if that happens, but that sounds inefficient. I thought you were asking for something where you could submit a query and it would report back as read/write or read-only. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Mon, May 27, 2013 at 03:06:13PM -0400, Alvaro Herrera wrote: Bruce Momjian wrote: OK, I have added a section to the TODO list for this: Desired changes that would prevent upgrades with pg_upgrade 32-bit page checksums Are there any others? I would have each data segment be self-identifying, i.e. have a magic number at the beginning of the file and the relation OID, some fork identification and the segment number somewhere -- probably the special space of the first page. Is this something we want on the TODO? I was not clear how to do with without making the first page format special or wasting space on all the other pages. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On 05/28/2013 03:36 PM, Bruce Momjian wrote: The other option would be to do it on query execute but that doesn't seem as efficient as it would have to be parsed each time. Although it would still be better than reading the actual SQL. Well, you could do SET TRANSACTION READ ONLY, and that would prevent any write transactions. You could assume it is a read query, and get the error and resubmit on the master if that happens, but that sounds inefficient. I thought you were asking for something where you could submit a query and it would report back as read/write or read-only. No I am suggesting something that before anything happens with the parser, the protocol knows what is up. So things like pgpool-ii don't even need a parser, it just knows it is a read only query because the protocol says so. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging of PAM Authentication Failure
On Tue, May 28, 2013 at 01:32:53PM +0800, Craig Ringer wrote: On 05/11/2013 03:25 AM, Robert Haas wrote: Not really. We could potentially fix it by extending the wire protocol to allow the server to respond to the client's startup packet with a further challenge, and extend libpq to report that challenge back to the user and allow sending a response. But that would break on-the-wire compatibility, which we haven't done in a good 10 years, and certainly wouldn't be worthwhile just for this. We were just talking about things we'd like to do in wire protocol 4. Allowing multi-stage authentication has come up repeatedly and should perhaps go on that list. The most obvious case being ident auth failed, demand md5. Added to TODO. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
On 5/28/13 4:07 PM, Bruce Momjian wrote: On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: Either we'll increase budged for storage, or we need to touch every bit of the system. Did you try 'oid' as an unsigned int4? Using an internal catalog type for user data seems like a horrible idea to me... I'll also add that Maciej hasn't explained why these types couldn't be an extension (in fact, I'm pretty sure there's already code for this out there, though possibly not utilizing the extension framework). If you don't need implicit casting it should actually be pretty easy to do this externally, and I don't think maintenance would be an issue (it's not like uint's change...). -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
Maciej Gajewski wrote I'm also afraid that with the extension I'd be left on my own maintaining it forever. While if this could go into the core product, it would live forever. Clarification from the gallery: are we talking an extension or a custom PostgreSQL build/fork? If it is an extension the stick it up on GitHub and let whomever finds it valuable help contribute to keeping it relevant. No use letting perfection stand in the way of usability. If the current solutions are too slow then exploring the extension aspect - even if it falls short - is worthwhile. At minimum you learn from the experience and maybe someone else (or even yourself) can build on that foundation. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote: On 5/28/13 4:07 PM, Bruce Momjian wrote: On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: Either we'll increase budged for storage, or we need to touch every bit of the system. Did you try 'oid' as an unsigned int4? Using an internal catalog type for user data seems like a horrible idea to me... Uh, not sure if we can say oid is only an internal catalog type. It is certainly used for storing large object references. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning incompatibilities for Postgres 10.0
On Tue, May 28, 2013 at 03:39:10PM -0700, Joshua D. Drake wrote: On 05/28/2013 03:36 PM, Bruce Momjian wrote: The other option would be to do it on query execute but that doesn't seem as efficient as it would have to be parsed each time. Although it would still be better than reading the actual SQL. Well, you could do SET TRANSACTION READ ONLY, and that would prevent any write transactions. You could assume it is a read query, and get the error and resubmit on the master if that happens, but that sounds inefficient. I thought you were asking for something where you could submit a query and it would report back as read/write or read-only. No I am suggesting something that before anything happens with the parser, the protocol knows what is up. So things like pgpool-ii don't even need a parser, it just knows it is a read only query because the protocol says so. Oh, that is an interesting idea. The application is indicating it is read-only via the protocol, and poolers can optimize that. Don't we have the ability to pass arbitrary GUC values back through the protocol, e.g. transaction_read_only? If not, that might be a way to do this cleanly. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 05/28/2013 06:21 AM, Robert Haas wrote: As a general statement, I view this work as something that is likely needed no matter which one of the remove freezing approaches that have been proposed we choose to adopt. It does not fix anything in and of itself, but it (hopefully) removes an objection to the entire line of inquiry. Agreed. I have some ideas on how to reduce the impact of freezing as well (of course), and the description of your approach certainly seems to benefit them, especially as it removes the whole forensic information objection. One question though: if we're not removing the xmin, how do we know the maximum xid to which we can prune clog? I can imagine several ways given your approach. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 05/28/2013 07:17 AM, Andres Freund wrote: On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: I was talking this over with Jeff on the plane, and we wanted to be clear on your goals here: are you looking to eliminate the *write* cost of freezing, or just the *read* cost of re-reading already frozen pages? Both. The latter is what I have seen causing more hurt, but the former alone is painful enough. I guess I don't see how your proposal is reducing the write cost for most users then? - for users with frequently, randomly updated data, pdallvisible would not be ever set, so they still need to be rewritten to freeze - for users with append-only tables, allvisible would never be set since those pages don't get vacuumed - it would prevent us from getting rid of allvisible, which has a documented and known write overhead This means that your optimization would benefit only users whose pages get updated occasionally (enough to trigger vaccuum) but not too frequently (which would unset allvisible). While we lack statistics, intuition suggests that this is a minority of databases. If we just wanted to reduce read cost, why not just take a simpler approach and give the visibility map a isfrozen bit? Then we'd know which pages didn't need rescanning without nearly as much complexity. That would also make it more effective to do precautionary vacuum freezing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unsigned integer types
On 05/28/2013 07:00 PM, Bruce Momjian wrote: On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote: Did you try 'oid' as an unsigned int4? Using an internal catalog type for user data seems like a horrible idea to me... Uh, not sure if we can say oid is only an internal catalog type. It is certainly used for storing large object references. pg_largeobject has oids. I don't thing the fact that we use oids to store references to pg_largeobject should blind us to the fact that oid should be an opaque type. Using them as substitute unsigned ints seems like a horrible idea to me too. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of freezing
On 2013-05-28 09:29:26 -0700, Josh Berkus wrote: On 05/28/2013 07:17 AM, Andres Freund wrote: On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: I was talking this over with Jeff on the plane, and we wanted to be clear on your goals here: are you looking to eliminate the *write* cost of freezing, or just the *read* cost of re-reading already frozen pages? Both. The latter is what I have seen causing more hurt, but the former alone is painful enough. I guess I don't see how your proposal is reducing the write cost for most users then? - for users with frequently, randomly updated data, pdallvisible would not be ever set, so they still need to be rewritten to freeze If they update all data they simply never need to get frozen since they are not old enough. - for users with append-only tables, allvisible would never be set since those pages don't get vacuumed They do get vacuumed at least every autovacuum_freeze_max_age even now. And we should vacuum them more often to make index only scan work without manual intervention. - it would prevent us from getting rid of allvisible, which has a documented and known write overhead Aha. This means that your optimization would benefit only users whose pages get updated occasionally (enough to trigger vaccuum) but not too frequently (which would unset allvisible). While we lack statistics, intuition suggests that this is a minority of databases. I don't think that follows. If we just wanted to reduce read cost, why not just take a simpler approach and give the visibility map a isfrozen bit? Then we'd know which pages didn't need rescanning without nearly as much complexity. That would also make it more effective to do precautionary vacuum freezing. Because we would still write/dirty/xlog the changes three times? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On 2013-05-28 09:39:13 -0700, Josh Berkus wrote: On 05/28/2013 06:21 AM, Robert Haas wrote: As a general statement, I view this work as something that is likely needed no matter which one of the remove freezing approaches that have been proposed we choose to adopt. It does not fix anything in and of itself, but it (hopefully) removes an objection to the entire line of inquiry. Agreed. I have some ideas on how to reduce the impact of freezing as well (of course), and the description of your approach certainly seems to benefit them, especially as it removes the whole forensic information objection. One question though: if we're not removing the xmin, how do we know the maximum xid to which we can prune clog? I can imagine several ways given your approach. Simply don't count xids which are frozen. Currently we ignore an xid because its a special value, after this because the tuple has a certain hint bit (combination) set. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication, frozen snapshot backup on it and missing relfile (postgres 9.2.3 on xfs + LVM)
On Tue, May 28, 2013 at 10:53 AM, Benedikt Grundmann bgrundm...@janestreet.com wrote: Today we have seen 2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10 EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied --- flushed only to 1E7E/21CB79A0,writing block 9 of relation base/16416/293974676 2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10 EDT,,0,ERROR,XX000,xlog flush request 1E95/AFB2DB10 is not satisfied --- flushed only to 1E7E/21CB79A0,writing block 9 of relation base/16416/293974676 while taking the backup of the primary. We have been running for a few days like that and today is the first day where we see these problems again. So it's not entirely deterministic / we don't know yet what we have to do to reproduce. So this makes Robert's theory more likely. However we have also using this method (LVM + rsync with hardlinks from primary) for years without these problems. So the big question is what changed? Well... I don't know. But my guess is there's something wrong with the way you're using hardlinks. Remember, a hardlink means two logical pointers to the same file on disk. So if either file gets modified after the fact, then the other pointer is going to see the changes. The xlog flush request not satisfied stuff could happen if, for example, the backup is pointing to a file, and the primary is pointing to the same file, and the primary modifies the file after the backup is taken (thus modifying the backup after-the-fact). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Tue, May 28, 2013 at 7:27 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-05-28 09:39:13 -0700, Josh Berkus wrote: On 05/28/2013 06:21 AM, Robert Haas wrote: As a general statement, I view this work as something that is likely needed no matter which one of the remove freezing approaches that have been proposed we choose to adopt. It does not fix anything in and of itself, but it (hopefully) removes an objection to the entire line of inquiry. Agreed. I have some ideas on how to reduce the impact of freezing as well (of course), and the description of your approach certainly seems to benefit them, especially as it removes the whole forensic information objection. One question though: if we're not removing the xmin, how do we know the maximum xid to which we can prune clog? I can imagine several ways given your approach. Simply don't count xids which are frozen. Currently we ignore an xid because its a special value, after this because the tuple has a certain hint bit (combination) set. Right, what he said. Calculating the XID before which we no longer need CLOG is just a matter of looking at all the tuples that we don't know to be frozen and taking the oldest XID from among those. This patch changes the definition of frozen but that's a pretty minor detail of the CLOG-truncation calculation. So, in essence, this patch doesn't really make much difference in that area either way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers