Re: [HACKERS] SSI modularity questions
On 27.06.2011 21:23, Kevin Grittner wrote: There are two outstanding patches for SSI which involve questions about modularity. In particular, they involve calls to predicate locking and conflict detection from executor source files rather than AM source files (where most such calls exist). (1) Dan submitted this patch: http://archives.postgresql.org/message-id/20110622045850.gn83...@csail.mit.edu which is a very safe and very simple patch to improve performance on sequential heap scans at the serializable transaction isolation level. The location of the code being modified raised questions about modularity. There is a reasonably clear place to which it could be moved in the heap AM, but because it would acquire a predicate lock during node setup, it would get a lock on the heap even if the node was never used, which could be a performance regression in some cases. The bigger question is if those calls are needed at all (http://archives.postgresql.org/message-id/4e072ea9.3030...@enterprisedb.com). I'm uneasy about changing them this late in the release cycle, but I don't feel good about leaving useless clutter in place just because we're late in the release cycle either. More importantly, if locking the whole relation in a seqscan is not just a performance optimization, but is actually required for correctness, it's important that we make the code and comments to reflect that or someone will break it in the future. (2) In reviewing the above, Heikki noticed that there was a second place in the executor that SSI calls were needed but missing. I submitted a patch here: http://archives.postgresql.org/message-id/4e07550f02250003e...@gw.wicourts.gov I wonder, though, whether the section of code which I needed to modify should be moved to a new function in heapam.c on modularity grounds. If these two places were moved, there would be no SSI calls from any source file in the executor subdirectory. Same here, we might not need those PredicateLockTuple calls in bitmap heap scan at all. Can you check my logic, and verify if those PredicateLockTuple() calls are needed? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Your Postgresql 9.2 patch
Leonardo, Your patch: use less space in xl_xact_commit ... has been waiting on an updated version from you for 10 days now. Do you think you're likely to complete it for this CommitFest? I sent an email on the subject: http://postgresql.1045698.n5.nabble.com/use-less-space-in-xl-xact-commit-patch-tp4400729p4495125.html and the thread went on and on but now I'm confused about what to do... Can someone tell me what we want??? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Mon, Jun 27, 2011 at 10:32 PM, Alexander Korotkov aekorot...@gmail.comwrote: I didn't have an estimate yet, but I'm working on it. Now, it seems that I have an estimate. N - total number of itups B - avg. number of itups in page H - height of tree K - avg. number of itups fitting in node buffer step - level step of buffers K = 2 * B^step avg. number of internal pages with buffers = 2*N/((2*B)^step - 1) (assume pages to be half-filled) avg. itups in node buffer = K / 2 (assume node buffers to be half-filled) Each internal page with buffers can be produces by split of another internal page with buffers. So, number of additional penalty calls = 2*N/((2*B)^step - 1) * K / 2 =(approximately)= 2*N*(1/2)^step While number of regular penalty calls is H*N Seems that fraction of additional penalty calls should decrease with increase of level step (while I didn't do experiments with level step != 1). Also, we can try to broke K = 2 * B^step equation. This can increase number of IOs, but decrease number of additional penalty calls and, probably, increase tree quality in some cases. -- With best regards, Alexander Korotkov.
Re: [HACKERS] silent_mode and LINUX_OOM_ADJ
On Tue, 28 Jun 2011 at 10:40, Heikki Linnakangas wrote: It seems to me you could just stop setting silent_mode. If you want to capture any early errors at startup into a log file, like silent_mode does to postmaster.log, you can redirect stdout and stderr in the startup script. pg_ctl start -l postmaster.log will do the same. OK, thanks. I'll try that next time I touch the packages. cu Reinhard -- SUSE LINUX Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Your Postgresql 9.2 patch
On Tue, Jun 28, 2011 at 8:30 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Leonardo, Your patch: use less space in xl_xact_commit ... has been waiting on an updated version from you for 10 days now. Do you think you're likely to complete it for this CommitFest? I sent an email on the subject: http://postgresql.1045698.n5.nabble.com/use-less-space-in-xl-xact-commit-patch-tp4400729p4495125.html and the thread went on and on but now I'm confused about what to do... Can someone tell me what we want??? I've nearly finished editing prior to commit, so no worries. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Hackers]Backend quey plan process
Excerpts from HuangQi's message of lun jun 27 23:56:22 -0400 2011: BTW, which email system are you using to send to postgres mailing list? As you can keep the top-answering and maintain the title of your email with [hackers] in front, my gmail can not help on that. For this email, I just add by hand. You shouldn't add the [Hackers] prefix yourself. The mail list server will add it automatically for you. Gmail deals with this perfectly fine. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
Magnus Hagander mag...@hagander.net writes: We discussed earlier to potentially block the creation, and removal, of branches on the origin server, to prevent mistakes like this. It has only happened once in almost a year, so it's probably not necessary - but I wanted to raise the option anyway in case people forgot about it. The downside would be that in order to create or drop a branch *when intended* a committer would need someone from the infrastructure team to temporarily switch off the branch-blocking setting, and then back on.. I think it would be sensible to block branch removal, as there's basically never a scenario where we'd do that during current usage. I'm not excited about blocking branch addition, although I worry sooner or later somebody will accidentally push a private development branch :-(. Is it possible to block only removal and not addition? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
On Tue, Jun 28, 2011 at 16:39, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: We discussed earlier to potentially block the creation, and removal, of branches on the origin server, to prevent mistakes like this. It has only happened once in almost a year, so it's probably not necessary - but I wanted to raise the option anyway in case people forgot about it. The downside would be that in order to create or drop a branch *when intended* a committer would need someone from the infrastructure team to temporarily switch off the branch-blocking setting, and then back on.. I think it would be sensible to block branch removal, as there's basically never a scenario where we'd do that during current usage. I'm not excited about blocking branch addition, although I worry sooner or later somebody will accidentally push a private development branch :-(. Is it possible to block only removal and not addition? Yes, I think so. Either way it'll require a small addition to the scripts we're using, so I'll try to just turn it into two different settings. I don't see offhand any reason why this shouldn't work. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis pg...@j-davis.com wrote: So, in effect, RANGEINPUT is a special type used only for range constructors. If someone tried to output it, it would throw an exception, and we'd even have enough information at that point to print a nice error message with a hint. I don't think I like the idea of throwing an error when you try to output it, but the rest seems reasonably sensible. Actually, this is pretty much exactly Florian's idea (thanks again, Florian), but at the time I didn't like it because pair didn't capture everything that I wanted to capture, like infinite bounds, etc. But there's no reason that it can't, and your point made me realize that -- you are effectively just using TEXT as the intermediate type (which works, but has some undesirable characteristics). What undesirable characteristics? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
On 06/28/2011 10:39 AM, Tom Lane wrote: Magnus Hagandermag...@hagander.net writes: We discussed earlier to potentially block the creation, and removal, of branches on the origin server, to prevent mistakes like this. It has only happened once in almost a year, so it's probably not necessary - but I wanted to raise the option anyway in case people forgot about it. The downside would be that in order to create or drop a branch *when intended* a committer would need someone from the infrastructure team to temporarily switch off the branch-blocking setting, and then back on.. I think it would be sensible to block branch removal, as there's basically never a scenario where we'd do that during current usage. I'm not excited about blocking branch addition, although I worry sooner or later somebody will accidentally push a private development branch :-(. Is it possible to block only removal and not addition? +1. Spurious branch addition shouldn't cause us much pain - we'd just remove the new branch. Unwanted deletion is more disruptive. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix leaky-view problem, part 1
I took a look at this patch. It's incredibly simple, which is great, and it seems to achieve its goal. Suppose your query references two views owned by different roles. The quals of those views will have the same depth. Is there a way for information to leak from one view owner to another due to that? I like how you've assumed that the table owner trusts the operator class functions of indexes on his table to not leak information. That handily catches some basic and important qual pushdowns that would otherwise be lost. This makes assumptions, at a distance, about the possible scan types and how quals can be fitted to them. Specifically, this patch achieves its goals because any indexable qual is trustworthy, and any non-indexable qual cannot be pushed down further than the view's own quals. That seems to be true with current plan types, but it feels fragile. I don't have a concrete idea for improvement, though. Robert suggested another approach in http://archives.postgresql.org/message-id/aanlktimbn_6tyxreh5rc7pmizt-vjb3xgp8cuho0o...@mail.gmail.com ; might that approach avoid this hazard? The part 2 patch in this series implements the planner restriction more likely to yield performance regressions, so it introduces a mechanism for identifying when to apply the restriction. This patch, however, applies its restriction unconditionally. Since we will inevitably have a such mechanism before you are done sealing the leaks in our view implementation, the restriction in this patch should also use that mechanism. Therefore, I think we should review and apply part 2 first, then update this patch to use its conditional behavior. A few minor questions/comments on the implementation: On Mon, Jun 06, 2011 at 01:37:11PM +0100, Kohei Kaigai wrote: --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c + else if (IsA(node, Query)) + { + depth += 2; Why two? --- a/src/test/regress/expected/select_views.out +++ b/src/test/regress/expected/select_views.out +EXPLAIN SELECT * FROM your_credit WHERE f_leak(number,expired); +QUERY PLAN +-- + Seq Scan on credit_cards (cost=0.00..181.20 rows=1 width=96) Use EXPLAIN (COSTS OFF) in regression tests. We do not put much effort into the stability of exact cost values, and they do not matter for the purpose of this test. --- a/src/test/regress/sql/select_views.sql +++ b/src/test/regress/sql/select_views.sql @@ -8,3 +8,46 @@ SELECT * FROM street; SELECT name, #thepath FROM iexit ORDER BY 1, 2; SELECT * FROM toyemp WHERE name = 'sharon'; + +-- +-- Test for leaky-view problem +-- + +-- setups +SET client_min_messages TO 'warning'; + +DROP ROLE IF EXISTS alice; +DROP FUNCTION IF EXISTS f_leak(text); +DROP TABLE IF EXISTS credit_cards; + +RESET client_min_messages; No need for this. The regression tests always start on a clean database. + +CREATE USER alice; +CREATE FUNCTION f_leak(text, text) +RETURNS bool LANGUAGE 'plpgsql' +AS 'begin raise notice ''% = %'', $1, $2; return true; end'; I ran this test case on master, and it did not reproduce the problem. However, adding COST 0.1 to this CREATE FUNCTION did yield the expected problem behavior. I suggest adding that. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add support for logging current role (what to review?)
On Mon, Jun 27, 2011 at 6:25 PM, Alex Hunsaker bada...@gmail.com wrote: Ive been holding off because its marked as Waiting on Author, am now thinking thats a mistake. =) It links to this patch: http://archives.postgresql.org/message-id/20110215135131.gx4...@tamriel.snowman.net Which is older than the latest patch in that thread posted by Robert: http://archives.postgresql.org/message-id/AANLkTikMadttguOWTkKLtgfe90kxR=u9njk9zebrw...@mail.gmail.com (There are also various other patches and versions in that thread...) The main difference between the first and the last patch is the first one has support for changing what csv columns we output, while the latter just tacks on an additional column. The thread was very long and left me a bit confused as to what I should actually be looking at. Or perhaps thats the point-- we need to decide if a csvlog_fields GUC is worth it. This is pretty much a classic example of the tailing wagging the dog. Adding a new field to the output would be pretty easy if it weren't for the fact that we want the CSV log output to contain all the same fields that are available in the regular log. Of course, adding a field to the CSV format is no big deal either. But now you have more problems: (1) it breaks backward compatibility, (2) it makes the log files bigger, and (3) if the new field is more expensive to compute than the existing ones, then you're forcing people who want to use the CSV log format to incur an overhead for a feature they don't care about. We could surmount these problems by making the CSV log format more flexible, but (a) that's a lot of work, (b) it imposes a burden on tool-builders, and (c) it might produce a really long icky-looking configuration line in postgresql.conf to show the default value of the GUC controlling the log format. (You may think I'm joking about this last one, but the point was raised... several times.) The upshot, presumably, is that if Stephen would like us to add this new field, he needs to be willing to rewrite our entire logging infrastructure first... and then maybe we'll bounce the rewrite because it adds too much complexity, but who was it that asked for all that complexity again? Oh, that's right: we did. Now, I admit that I've been guilty of engaging in this kind of scope creep from time to time myself, so everyone feel free to pitch your favorite loose object in my direction. Anyway, if we are going to insist on rewriting substantial chunks of the logging infrastructure before doing this, we at least need to reach some agreement on what would be an acceptable outcome - and then let Stephen code that up as a separate patch, submit it, get it committed, and then do this on top of that. Or we can just decide that adding one relatively minor field to the text format output is not worth knocking over the applecart for. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
On Tue, Jun 28, 2011 at 11:05 AM, Andrew Dunstan and...@dunslane.net wrote: +1. Spurious branch addition shouldn't cause us much pain - we'd just remove the new branch. Unwanted deletion is more disruptive. How about if we allow addition only of branches matching /^REL_[0-9_]+_STABLE$/ and disallow deletion of all branches? That seems like it'd allow the one operation we will likely want to do with any regularity (creating a new release branch once a year) without going through hoops, while disallowing most of the problem cases. The problem with allowing people to create branches and not remove them is that someone might push a private branch and not be able to get rid of it. But if we only allow creation of branches that look like the branches that are supposed to be there, then that shouldn't be a danger. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Your Postgresql 9.2 patch
I've nearly finished editing prior to commit, so no worries. Thank you, let me know if I can help. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make the visibility map crash-safe.
2011/6/27 Robert Haas robertmh...@gmail.com: On Thu, Jun 23, 2011 at 9:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 22, 2011 at 10:23 PM, Robert Haas robertmh...@gmail.com wrote: Well, it seems I didn't put nearly enough thought into heap_update(). The fix for the immediate problem looks simple enough - all the code has been refactored to use the new API, so the calls can be easily be moved into the critical section (see attached). But looking at this a little more, I see that heap_update() is many bricks short of a load, because there are several places where the buffer can be unlocked and relocked, and we don't recheck whether the page is all-visible after reacquiring the lock. So I've got some more work to do here. See what you think of the attached. I *think* this covers all bases. It's a little more complicated than I would like, but I don't think fatally so. For lack of comment, committed. It's hopefully at least better than what was there before, which was clearly several bricks short of a load. out of curiosity, does it affect the previous benchmarks of the feature ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make the visibility map crash-safe.
On Tue, Jun 28, 2011 at 11:44 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: out of curiosity, does it affect the previous benchmarks of the feature ? I don't think there's much performance impact, because the only case where we actually have to do any real work is when vacuum comes through and marks a buffer we're about to update all-visible. It would probably be good to run some tests to verify that, though. I'll try to set something up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Hackers]Backend quey plan process
On 28 June 2011 22:36, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from HuangQi's message of lun jun 27 23:56:22 -0400 2011: BTW, which email system are you using to send to postgres mailing list? As you can keep the top-answering and maintain the title of your email with [hackers] in front, my gmail can not help on that. For this email, I just add by hand. You shouldn't add the [Hackers] prefix yourself. The mail list server will add it automatically for you. Gmail deals with this perfectly fine. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support I see. Thanks. -- Best Regards Huang Qi Victor
Re: [HACKERS] Range Types, constructors, and the type system
On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote: Do we think that this is a good way forward? The only thing I can think of that's undesirable is that it's not normal to be required to cast the result of a function, and might be slightly difficult to explain in the documentation in a straightforward way That's the part that bothers me. I think that if it's not cast it should somehow be useful. Maybe default to a text range or something? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
Robert Haas robertmh...@gmail.com writes: How about if we allow addition only of branches matching /^REL_[0-9_]+_STABLE$/ and disallow deletion of all branches? +1, if feasible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
On Jun 28, 2011 6:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: How about if we allow addition only of branches matching /^REL_[0-9_]+_STABLE$/ and disallow deletion of all branches? +1, if feasible. Pretty sure that's just a Small Matter Of Programming. I'll give it a try. /Magnus
Re: [HACKERS] Range Types, constructors, and the type system
On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote: On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis pg...@j-davis.com wrote: So, in effect, RANGEINPUT is a special type used only for range constructors. If someone tried to output it, it would throw an exception, and we'd even have enough information at that point to print a nice error message with a hint. I don't think I like the idea of throwing an error when you try to output it, but the rest seems reasonably sensible. I thought it might add a little confusion if people thought they had a range type but really had RANGEINPUT. For instance, if you do a create table as select range(1,2) then the result might be slightly unexpected. But it's probably no more unexpected than create table as select 'foo'. So, I suppose there's not much reason to throw an error. We can just output it in the same format as a range type. It's also much easier to explain something in the documentation that has an output format, because at least it's tangible. Actually, this is pretty much exactly Florian's idea (thanks again, Florian), but at the time I didn't like it because pair didn't capture everything that I wanted to capture, like infinite bounds, etc. But there's no reason that it can't, and your point made me realize that -- you are effectively just using TEXT as the intermediate type (which works, but has some undesirable characteristics). What undesirable characteristics? Well, for one, outputting something as text and then reading it back in does not always produce the same value. For instance, for float, it only does that if you have extra_float_digits set to some high-enough value. I suppose I could save the GUC, set it, and set it back; but that seems like unnecessary ugliness. There's also the deparsing/reparsing cycle. That might not really matter for performance, but it seems unnecessary. And there's always the fallback that we have types for a reason. Wouldn't it be odd if you wrote a query like: select range(1,2) || 'foo' and it succeeded? I'm sure that kind of thing can lead to some dangerous situations. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
On Tue, 2011-06-28 at 09:30 -0700, David E. Wheeler wrote: On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote: Do we think that this is a good way forward? The only thing I can think of that's undesirable is that it's not normal to be required to cast the result of a function, and might be slightly difficult to explain in the documentation in a straightforward way That's the part that bothers me. Yeah, that bothered me, too. I think that if it's not cast it should somehow be useful. Let's see, what can one do with a range that has no ordering yet? ;) Robert suggested that we don't need to throw an error, and I think I agree. Just having a working output function solves most of the documentation problem, because it makes it less abstract. The only operators that we could really support are accessors, which seems somewhat reasonable. However, I'd have some concerns even about that, because if you do range(10,1), then what's the upper bound? Maybe default to a text range or something? That sounds a little dangerous: select range('1','09') would fail before it could be cast to int4range. We could invent an UNKNOWNRANGE type or something. But I don't particularly like that; it would start out working nicely when people only had one textrange type, and then their old queries would start failing when they added another range type based on text. I think it's fine if the RANGEINPUT type isn't too useful by itself. It's already a common requirement to cast unknown literals, and this isn't too much different. It's only for constructors, so it still fits pretty closely with that idea. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small patch for GiST: move childoffnum to child
Actually, there is no more direct need of this patch because I've rewrote insert function for fast build. But there are still two points for having this changes: 1) As it was noted before, it simplifies code a bit. 2) It would be better if childoffnum have the same semantics in regular insert and fastbuild insert. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Range Types, constructors, and the type system
On Tue, Jun 28, 2011 at 12:58 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote: On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis pg...@j-davis.com wrote: So, in effect, RANGEINPUT is a special type used only for range constructors. If someone tried to output it, it would throw an exception, and we'd even have enough information at that point to print a nice error message with a hint. I don't think I like the idea of throwing an error when you try to output it, but the rest seems reasonably sensible. I thought it might add a little confusion if people thought they had a range type but really had RANGEINPUT. For instance, if you do a create table as select range(1,2) then the result might be slightly unexpected. True... But it's probably no more unexpected than create table as select 'foo'. So, I suppose there's not much reason to throw an error. We can just output it in the same format as a range type. +1. It's also much easier to explain something in the documentation that has an output format, because at least it's tangible. +1. Actually, this is pretty much exactly Florian's idea (thanks again, Florian), but at the time I didn't like it because pair didn't capture everything that I wanted to capture, like infinite bounds, etc. But there's no reason that it can't, and your point made me realize that -- you are effectively just using TEXT as the intermediate type (which works, but has some undesirable characteristics). What undesirable characteristics? Well, for one, outputting something as text and then reading it back in does not always produce the same value. For instance, for float, it only does that if you have extra_float_digits set to some high-enough value. I suppose I could save the GUC, set it, and set it back; but that seems like unnecessary ugliness. Yeah, I don't think we want to go there. There's also the deparsing/reparsing cycle. That might not really matter for performance, but it seems unnecessary. And there's always the fallback that we have types for a reason. Wouldn't it be odd if you wrote a query like: select range(1,2) || 'foo' and it succeeded? I'm sure that kind of thing can lead to some dangerous situations. That's pretty much what we tried to get rid of with the 8.3 casting changes, so agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011: I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. Hmm, right. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like active transactions, potentially interfering older transactions, and original index -- from the user's point of view there's only one index and it just isn't completely built yet. Wow, that's a lot of mistakes for a single paragraph, sorry about that. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. I don't -- please go ahead. Original complaint in Message-id 4ddb64cb.7070...@2ndquadrant.com -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed
Excerpts from Tom Lane's message of mar jun 28 10:39:22 -0400 2011: I think it would be sensible to block branch removal, as there's basically never a scenario where we'd do that during current usage. I'm not excited about blocking branch addition, although I worry sooner or later somebody will accidentally push a private development branch :-(. Is it possible to block only removal and not addition? If we can tweak the thing, how about we only allow creating branches that match a certain pattern, say ^REL_\d+_\d+_STABLE$? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] marking old branches as no longer maintained
I'd like to add a feature to the buildfarm that automatically picks up any new branch and automatically stops building any branch we're not maintaining any more. To do the latter, I need some way for the client to detect that we are or aren't interested in a branch. What I'd like to do is add a file to the old back branches (say from 7.4 to 8.1 currently - I can grandfather the rest) called end-of-life-reached or some such, with some appropriate text. As a branch reaches its end of life, i.e. right before the last release we make, we should add that file to the branch. I think this would possibly be useful anyway, regardless of buildfarm utility - I still hear of people mistakenly grabbing and building releases past EOL, and maybe this will give one or two the extra clue they need that this is less than a good idea. Comments? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI modularity questions
Heikki Linnakangas wrote: On 27.06.2011 21:23, Kevin Grittner wrote: The bigger question is if those calls are needed at all ( http://archives.postgresql.org/message-id/4e072ea9.3030...@enterprisedb.com ). Ah, I didn't properly grasp your concerns the first time I read that. The heap relation lock for a seqscan is indeed required for correctness and has been there all along. The rs_relpredicatelocked flag was added in response to this: http://archives.postgresql.org/pgsql-hackers/2011-01/msg00730.php I'm uneasy about changing them this late in the release cycle, but I don't feel good about leaving useless clutter in place just because we're late in the release cycle either. More importantly, if locking the whole relation in a seqscan is not just a performance optimization, but is actually required for correctness, it's important that we make the code and comments to reflect that or someone will break it in the future. OK, if that isn't clear in the comments, we should definitely make it clear. Basically, the predicate locking strategy is as follows: (1) We're only concerned with read/write dependencies, also know as rw-conflicts. This is where two transactions overlap (each gets its snapshot before the other commits, so neither can see the work of the other), and one does a read which doesn't see the write of the other due only to the timing. (2) For rw-conflicts where the read follows the write, the predicate locks don't come into play -- we use the MVCC data in the heap tuples directly. (3) Heap tuples are locked so that updates or deletes by an overlapping transaction of the tuple which has been read can be detected as a rw-conflict. Keep in mind that access for such a delete or update may not go through the same index on which the conflicting read occurred. It might use a different index or a seqscan. These may be promoted to page or heap relation locks to control the shared space used by predicate locks, but the concept is the same -- we're locking actual tuples read, not any gaps. (4) Index ranges are locked to detect inserts or updates which create heap tuples which would have been read by an overlapping transaction if they had existed and been visible at the time of the index scan. The entire goal of locks on indexes is to lock the gaps where a scan *didn't* find anything; we only care about conflicting index tuple inserts. (5) When a heap scan is executed, there is no index gap to lock to cover the predicate involved, so we need to acquire a heap relation lock -- any insert to the relation by an overlapping transaction is a rw-conflict. While these *look* just like tuple locks which got promoted, their purpose is entirely different. Like index locks, they are for detecting inserts into the gaps. [Light bulb goes on over head: in some future release, perhaps it would be worth differentiating between the two uses of heap relation locks, to reduce the frequency of false positives. A couple bit flags in the lock structure might do it.] So, the heap relation lock is clearly needed for the seqscan. There is room for performance improvement there in skipping the tuple lock attempt when we're in a seqscan, which will always be a no-op when it finds the heap relation lock after a hash table lookup. But you are also questioning whether the predicate locking of the tuples where rs_relpredicatelocked is tested can be removed entirely, rather than conditioned on the boolean. The question is: can the code be reached on something other than a seqscan of the heap, and can this happen for a non-temporary, non-system table using a MVCC snapshot? I've been trying to work backward to all the spots which call these functions, directly or indirectly to determine that. That's obviously not trivial or easy work, and I fear that unless someone more familiar with the code than I can weigh in on that question for any particular PredicateLockTuple() call, I would rather leave the calls alone for 9.1 and sort this out in 9.2. I'm confident that they don't do any damage where they are; it's a matter of very marginal performance benefit (skipping a call to a fast return) and code tidiness (not making unnecessary calls). I can, with confidence, now answer my own previous question about moving the calls outside the influence of HeapKeyTest(): it's not necessary. The rows currently excluded won't be seen by the caller, so they don't fit under the needs of (3) above, and if (4) or (5) aren't covered where they need to be, locking a few extra rows won't help at all. So we can drop that issue. (2) In reviewing the above, Heikki noticed that there was a second place in the executor that SSI calls were needed but missing. I submitted a patch here: http://archives.postgresql.org/message-id/4e07550f02250003e...@gw.wicourts.gov I wonder, though, whether the section of code which I needed to modify should be moved to a new function in heapam.c on modularity
Re: [HACKERS] marking old branches as no longer maintained
On Tue, Jun 28, 2011 at 19:46, Andrew Dunstan and...@dunslane.net wrote: I'd like to add a feature to the buildfarm that automatically picks up any new branch and automatically stops building any branch we're not maintaining any more. To do the latter, I need some way for the client to detect that we are or aren't interested in a branch. What I'd like to do is add a file to the old back branches (say from 7.4 to 8.1 currently - I can grandfather the rest) called end-of-life-reached or some such, with some appropriate text. As a branch reaches its end of life, i.e. right before the last release we make, we should add that file to the branch. Does this need to be driven out of the main tree? Couldn't you just have a blacklist in the buildfarm code or site? (disclaimer: I haven't looked at how it works so that may be a completely useless idea..) Another way would be to just not run bulids if there are no commits in n days on a branch. Don't we already not run builds on branches with no comments? Maybe just put a limit on how long we allow an override of that? I think this would possibly be useful anyway, regardless of buildfarm utility - I still hear of people mistakenly grabbing and building releases past EOL, and maybe this will give one or two the extra clue they need that this is less than a good idea. If you want that to actually work, you probably need to do something to the point of breaking the configure script. There's zero chance of people who're not reading the information about which releases are supported are actually going read a random file somewhere in the source tree, regardless of where you place it and what you name it. You could reqiure something like ./configure --yes-i-know-what-i-am-doing or something like that, I guess... Comments? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE
On Mon, Jun 27, 2011 at 10:43 PM, Noah Misch n...@leadboat.com wrote: On Mon, Jun 27, 2011 at 03:45:43PM -0400, Robert Haas wrote: On Wed, Jun 15, 2011 at 1:03 AM, Noah Misch n...@leadboat.com wrote: [patch to avoid index rebuilds] With respect to the documentation hunks, it seems to me that the first hunk might be made clearer by leaving the paragraph of which it is a part as-is, and adding another paragraph afterwards beginning with the words In addition. The added restriction elaborates on the transitivity requirement, so I wanted to keep the new language adjacent to that. That makes sense, but it reads a bit awkwardly to me. Maybe it's just that the sentence itself is so complicated that I have difficulty understanding it. I guess it's the same problem as with the text you added about hash indexes: without thinking about it, it's hard to understand what territory is covered by the new sentence that is not covered by what's already there. In the case of the hash indexes, I think I have it figured out: we already know that we must get compatible hash values out of logically equal values of different datatypes. But it's possible that the inter-type cast changes the value in some arbitrary way and then compensates for it by defining the hash function in such a way as to compensate. Similarly, for btrees, we need the relative ordering of A and B to remain the same after casting within the opfamily, not to be rearranged somehow. Maybe the text you've got is OK to explain this, but I wonder if there's a way to do it more simply. As you no doubt expected, my eyes was immediately drawn to the index-resurrection hack. Reviewing the thread, I see that you asked about that in January and never got feedback. I have to say that what you've done here looks like a pretty vile hack, but it's hard to say for sure without knowing what to compare it against. You made reference to this being smaller and simpler than updating the index definition in place - can you give a sketch of what would need to be done if we went that route instead? In at7-index-opfamily.patch attached to http://archives.postgresql.org/message-id/20110113230124.ga18...@tornado.gateway.2wire.net check out the code following the comment /* The old index is compatible. Update catalogs. */ until the end of the function. That code would need updates for per-column collations, and it incorrectly reuses values/nulls/replace arrays. It probably does not belong in tablecmds.c, either. However, it gives the right general outline. It would be valuable to avoid introducing a second chunk of code that knows everything about the catalog entries behind an index. That's what led me to the put forward the most recent version as best. What do you find vile about that approach? I wasn't comfortable with it at first, because I suspected the checks in RelationPreserveStorage() might be important for correctness. Having studied it some more, though, I think they just reflect the narrower needs of its current sole user. Maybe vile is a strong word, but it seems like a modularity violation: we're basically letting DefineIndex() do some stuff we don't really want done, and then backing it out parts of it that we don't really want done after all. It seems like it would be better to provide DefineIndex with enough information not to do the wrong thing in the first place. Could we maybe pass stmt-oldNode to DefineIndex(), and let it work out the details? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add support for logging current role (what to review?)
On Tue, Jun 28, 2011 at 09:15, Robert Haas robertmh...@gmail.com wrote: Anyway, if we are going to insist on rewriting substantial chunks of the logging infrastructure before doing this, we at least need to reach some agreement on what would be an acceptable outcome - and then let Stephen code that up as a separate patch, submit it, get it committed, and then do this on top of that. Or we can just decide that adding one relatively minor field to the text format output is not worth knocking over the applecart for. Well, one of the reasons I picked up this patch is its a feature *I* want. I remember being quite surprised at the role csv logging reports when i switched to it. I was logging everything so I have on occasion had to find a SET ROLE and follow the session... its quite annoying :-) I think if Stephen was proposing 10 fields, or if there was a list of fields we were planning on adding in the next release or 3, it might be worth re-factoring. I know of no such list, and I think this field useful/important enough that people who are using csv logging would want it anyway. +1 on just tacking on the field and causing a flag day for csv users. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 06/25/2011 12:14 AM, Steve Singer wrote: I'm not a libpq guru but I've given your patch a look over. Thanks for the review! I have since simplified the patch to assume that partial SSL writes are disabled -- according to SSL_write(3) this is the default behaviour. Now the SSL retry buffer only holds the data to be retried, the remainder is moved to the new outBuffer. -The patch doesn't compile with non-ssl builds, the debug at the bottom of PQSendSome isn't in an #ifdef Ack, there was another one in pqFlush. Fixed that. -I don't think your handling the return code properly. Consider this case. pqSendSome(some data) sslRetryBuf = some Data return 1 pqSendSome(more data) it sends all of 'some data' returns 0 I think 1 should be returned because all of 'more data' still needs to be sent. I think returning a 0 will break PQsetnonblocking if you call it when there is data in both sslRetryBuf and outputBuffer. Hmm, I thought I thought about that. There was a check in the original patch: if (conn-sslRetryBytes || (conn-outCount - remaining) 0) So if the SSL retry buffer was emptied it would return 1 if there was something left in the regular output buffer. Or did I miss something there? We might even want to try sending the data in outputBuffer after we've sent all the data sitting in sslRetryBuf. IMHO that'd add unnecessary complexity to the pqSendSome. As this only happens in non-blocking mode the caller should be well prepared to handle the retry. If you close the connection with an outstanding sslRetryBuf you need to free it. Fixed. New version of the patch attached. regards, Martin diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 9e4807e..8dc0226 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2792,6 +2792,10 @@ freePGconn(PGconn *conn) if (conn-gsslib) free(conn-gsslib); #endif +#if defined(USE_SSL) + if (conn-sslOutBufPtr) + free(conn-sslOutBufPtr); +#endif /* Note that conn-Pfdebug is not ours to close or free */ if (conn-last_query) free(conn-last_query); diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 17dde4a..d0c7812 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -781,6 +781,8 @@ pqSendSome(PGconn *conn, int len) char *ptr = conn-outBuffer; int remaining = conn-outCount; int result = 0; + int sent = 0; + bool shiftOutBuffer = true; if (conn-sock 0) { @@ -789,23 +791,34 @@ pqSendSome(PGconn *conn, int len) return -1; } +#ifdef USE_SSL + if (conn-sslRetryPos) + { + /* + * We have some leftovers from previous SSL write attempts, these need + * to be sent before the regular output buffer. + */ + len = remaining = conn-sslRetrySize; + ptr = conn-sslRetryPos; + shiftOutBuffer = false; + } +#endif + /* while there's still data to send */ while (len 0) { - int sent; char sebuf[256]; + int write_size = len; -#ifndef WIN32 - sent = pqsecure_write(conn, ptr, len); -#else - +#ifdef WIN32 /* * Windows can fail on large sends, per KB article Q201213. The * failure-point appears to be different in different versions of * Windows, but 64k should always be safe. */ - sent = pqsecure_write(conn, ptr, Min(len, 65536)); + write_size = Min(len, 65536); #endif + sent = pqsecure_write(conn, ptr, write_size); if (sent 0) { @@ -857,6 +870,38 @@ pqSendSome(PGconn *conn, int len) return -1; } } +#ifdef USE_SSL + else if (sent == 0 conn-ssl !conn-sslRetryPos) + { + /* + * pqsecure_write indicates that a SSL write needs to be retried. + * With non-blocking connections we need to ensure that the buffer + * passed to the SSL_write() retry is the the exact same buffer as + * in previous write -- see SSL_write(3SSL) for more on this. For + * this we need to save the the original buffer and allocate a new + * buffer for outgoing data. + */ + conn-sslOutBufPtr = conn-outBuffer; + conn-sslRetryPos = ptr; + conn-sslRetrySize = write_size; + + /* + * Allocate a new output buffer and prepare to move the remainder + * of the original outBuffer there. + */ + remaining -= write_size; + ptr += write_size; + + conn-outBufSize = Max(16 * 1024, remaining); + conn-outBuffer = malloc(conn-outBufSize); + if (conn-outBuffer == NULL) + { +printfPQExpBuffer(conn-errorMessage, + cannot allocate memory for output buffer\n); +return -1; + } + } +#endif else { ptr += sent; @@ -903,10 +948,36 @@ pqSendSome(PGconn *conn, int len) } } - /* shift the remaining contents of the buffer */ - if (remaining 0) - memmove(conn-outBuffer, ptr, remaining); - conn-outCount = remaining; +#ifdef USE_SSL + if (conn-sslRetryPos sent 0) + { + /* + * A SSL write was successfully retried, free the retry buffer and + * switch to the regular outBuffer.
Re: [HACKERS] SSI modularity questions
On 28.06.2011 20:47, Kevin Grittner wrote: (3) Heap tuples are locked so that updates or deletes by an overlapping transaction of the tuple which has been read can be detected as a rw-conflict. Keep in mind that access for such a delete or update may not go through the same index on which the conflicting read occurred. It might use a different index or a seqscan. These may be promoted to page or heap relation locks to control the shared space used by predicate locks, but the concept is the same -- we're locking actual tuples read, not any gaps. Ok, that's what I was missing. So the predicate locks on heap tuples are necessary. Thanks for explaining this again. So, the heap relation lock is clearly needed for the seqscan. There is room for performance improvement there in skipping the tuple lock attempt when we're in a seqscan, which will always be a no-op when it finds the heap relation lock after a hash table lookup. But you are also questioning whether the predicate locking of the tuples where rs_relpredicatelocked is tested can be removed entirely, rather than conditioned on the boolean. The question is: can the code be reached on something other than a seqscan of the heap, and can this happen for a non-temporary, non-system table using a MVCC snapshot? I've been trying to work backward to all the spots which call these functions, directly or indirectly to determine that. That's obviously not trivial or easy work, and I fear that unless someone more familiar with the code than I can weigh in on that question for any particular PredicateLockTuple() call, I would rather leave the calls alone for 9.1 and sort this out in 9.2. I'm confident that they don't do any damage where they are; it's a matter of very marginal performance benefit (skipping a call to a fast return) and code tidiness (not making unnecessary calls). Hmm, the calls in question are the ones in heapgettup() and heapgettup_pagemode(), which are subroutines of heap_getnext(). heap_getnext() is only used in sequential scans, so it seems safe to remove those calls. (2) In reviewing the above, Heikki noticed that there was a second place in the executor that SSI calls were needed but missing. I submitted a patch here: http://archives.postgresql.org/message-id/4e07550f02250003e...@gw.wicourts.gov I wonder, though, whether the section of code which I needed to modify should be moved to a new function in heapam.c on modularity grounds. If these two places were moved, there would be no SSI calls from any source file in the executor subdirectory. Same here, we might not need those PredicateLockTuple calls in bitmap heap scan at all. Can you check my logic, and verify if those PredicateLockTuple() calls are needed? These sure look like they are needed per point (3) above. Yep. I would like to add a test involving a lossy bitmap scan. How many rows are normally needed to force a bitmap scan to be lossy? The size of bitmaps is controlled by work_mem, so you can set work_mem very small to cause them to become lossy earlier. Off the top of my head I don't have any guesstimate on how many rows you need. What's the easiest way to check whether a plan is going to use (or is using) a lossy bitmap scan? Good question. There doesn't seem to be anything in the EXPLAIN ANALYZE output to show that, so I think you'll have to resort to adding some elog()s in the right places. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI modularity questions
On Tue, Jun 28, 2011 at 1:47 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (5) When a heap scan is executed, there is no index gap to lock to cover the predicate involved, so we need to acquire a heap relation lock -- any insert to the relation by an overlapping transaction is a rw-conflict. While these *look* just like tuple locks which got promoted, their purpose is entirely different. Like index locks, they are for detecting inserts into the gaps. [Light bulb goes on over head: in some future release, perhaps it would be worth differentiating between the two uses of heap relation locks, to reduce the frequency of false positives. A couple bit flags in the lock structure might do it.] You know, it just occurred to me while reading this email that you're using the term predicate lock in a way that is totally different from what I learned in school. What I was taught is that the word predicate in predicate lock is like the word tuple in tuple lock or the word relation in relation lock - that is, it describes *the thing being locked*. In other words, you are essentially doing: LOCK TABLE foo WHERE i = 1; I think that what you're calling the predicate lock manager should really be called the siread lock manager, and all of the places where you are predicate locking a tuple should really be siread locking the tuple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Jun23, 2011, at 23:40 , Robert Haas wrote: I tried rewriting the LWLocks using CAS. It actually seems to make things slightly worse on the tests I've done so far, perhaps because I didn't make it respect spins_per_delay. Perhaps fetch-and-add would be better, but I'm not holding my breath. Everything I'm seeing so far leads me to the belief that we need to get rid of the contention altogether, not just contend more quickly. I got curious and put a tool together to benchmark this. The code can be found at https://github.com/fgp/lockbench. The tool starts N workers who each acquire a lock in SHARE mode, increment a per-worker counter and release the lock again, rinse, repeat. That is done for T seconds, after which it reports the sum of the individual counters, the elapsed (wall) time and the sums of the user and system times consumed by the workers. The lock implementations currently supported are atomicinc: RW-Lock using atomic increment/decrement instructions (locked xaddl) to acquire and release the lock in SHARE mode in the non-contested case. The contested case is unimplemented. cmpxchng: Like atomicinc, but uses locked cmpxchng loop instead of locked xaddl. spin: RW-Lock built around a simple cmpxchng-based spinlock (i.e., to lock/unlock spinlock is taken, shared-lockers count is incremented/ decremented, spinlock is released). Similar to pg_lwlock, but without the sleep() in the contested path of the spinlock. The contested case of the RW-Lock is again unimplemented. none: No locking. pg_lwlock: Postgres LWLock implementation. The contested case doesn't work because the workers currently don't have a valid MyProc. pw_lwlock_cas: Like pg_lwlock but with your CAS patch applied. On an 4-core Intel Xeon system (Intel Xeon X3430, 2.4 GHz, 8MB cache) I get the following results: | 1 worker| 2 workers | 3 workers | 4 workers | |wallsec usersec|wallsec usersec|wallsec usersec|wallsec usersec| | / / | / / | / / | / / | |cycles cycles|cycles cycles|cycles cycles|cycles cycles| -- none |2.5e-09 2.5e-09|1.3e-09 2.5e-09|8.5e-10 2.5e-09|6.8e-10 2.5e-09| atomicinc|1.8e-08 1.8e-08|2.8e-08 5.6e-08|2.7e-08 8.1e-08|2.9e-08 1.1e-07| cmpxchng |3.0e-08 3.0e-08|7.1e-08 1.4e-07|6.9e-08 2.0e-07|7.2e-08 2.9e-07| spin |5.0e-08 5.0e-08|3.0e-07 5.9e-07|3.8e-07 1.1e-06|4.0e-07 1.5e-06| pg_lwlock|2.8e-08 2.8e-08|2.9e-08 3.0e-08|3.0e-08 3.2e-08|2.9e-08 3.1e-08| pg_lwlock|2.6e-08 2.6e-08|6.2e-08 1.2e-07|7.7e-08 2.3e-07|7.8e-08 3.1e-07| _cas| | || | -- These results allow the following conclusions to be drawn First, our current pg_lwlock is quite good at preserving resources, i.e. at not spinning excessively - at least for up to four workers. It's the only lock implementation that consistently uses about 30ns of processor time for one acquire/release cycle. Only atomicinc with one worker (i.e. no cachline fighting) manages to beat that. It does, however, effectively serializate execution with this workload - the consumed user time is approximately equal to the elapsed wall clock time, even in the case of 4 workers, meaning that most of the time 3 of those 4 workers are sleeping. Secondly, pg_lwlock_cas and cmpxchng perform simiarly, which comes at no surprise, since use effectively the same algorithm. One could expect spin and pg_lwlock to also perform similarly, but these two don't. The reason is probably that spin uses a rather brain-dead spinning method, while pg_lwlock uses rep; nop. Now to atomicinc, which is (the fast-path of) the most optimized RW-lock possible, at least on x86 and x86_64. One interesting result here is that wall time seconds / cycle are suspiciously for atomicinc and pg_lwlock. This proves, I think, that the limiting factor in both of these tests is the speed at which cache line invalidation can occur. It doesn't seem to matter whether the other cores are idle (as in the pg_lwlock case), or trying to obtain ownership of the cache line themselves (as in the atomicinc case). Finally, the no-locking test (none) shows that, even though the counter is written to shared memory after every increment, memory bandwith isn't an issue here, since performance scaled nearly linearly with the number of workers here. Here's the result of another run, this time with the per-worker counter being increment 100 in each acquire/release cycle. --- 100 Increments per Cycle - | 1 worker| 2 workers | 3 workers | 4 workers |
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 01:54 PM, Magnus Hagander wrote: On Tue, Jun 28, 2011 at 19:46, Andrew Dunstanand...@dunslane.net wrote: I'd like to add a feature to the buildfarm that automatically picks up any new branch and automatically stops building any branch we're not maintaining any more. To do the latter, I need some way for the client to detect that we are or aren't interested in a branch. What I'd like to do is add a file to the old back branches (say from 7.4 to 8.1 currently - I can grandfather the rest) called end-of-life-reached or some such, with some appropriate text. As a branch reaches its end of life, i.e. right before the last release we make, we should add that file to the branch. Does this need to be driven out of the main tree? Couldn't you just have a blacklist in the buildfarm code or site? (disclaimer: I haven't looked at how it works so that may be a completely useless idea..) Not very easily, mainly because of difficulties with MinGW SDK perl. Building it into the code would defeat the purpose. If it's contentious I won't bother. We've managed OK for years, and can go on managing as we are. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
New version of patch. Bug which caused falldown on trees with high number of levels was fixed. Also some more comments and refactoring. -- With best regards, Alexander Korotkov. gist_fast_build-0.3.0.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On Tue, Jun 28, 2011 at 20:38, Andrew Dunstan and...@dunslane.net wrote: On 06/28/2011 01:54 PM, Magnus Hagander wrote: On Tue, Jun 28, 2011 at 19:46, Andrew Dunstanand...@dunslane.net wrote: I'd like to add a feature to the buildfarm that automatically picks up any new branch and automatically stops building any branch we're not maintaining any more. To do the latter, I need some way for the client to detect that we are or aren't interested in a branch. What I'd like to do is add a file to the old back branches (say from 7.4 to 8.1 currently - I can grandfather the rest) called end-of-life-reached or some such, with some appropriate text. As a branch reaches its end of life, i.e. right before the last release we make, we should add that file to the branch. Does this need to be driven out of the main tree? Couldn't you just have a blacklist in the buildfarm code or site? (disclaimer: I haven't looked at how it works so that may be a completely useless idea..) Not very easily, mainly because of difficulties with MinGW SDK perl. Building it into the code would defeat the purpose. Drat. If it's contentious I won't bother. We've managed OK for years, and can go on managing as we are. If we can find a good way to do it, I think having BF animals automatically picking up new branches is a very good thing to have. So don't give up so easily :D If adding a more or less random file to back branches is the only way to do it, I'm for doing that - I'd just like to find some method that feels cleaner. But maybe I'm just bikeshedding for no real use here. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On Tue, Jun 28, 2011 at 8:02 PM, Magnus Hagander mag...@hagander.net wrote: If we can find a good way to do it, I think having BF animals automatically picking up new branches is a very good thing to have. So don't give up so easily :D If adding a more or less random file to back branches is the only way to do it, I'm for doing that - I'd just like to find some method that feels cleaner. But maybe I'm just bikeshedding for no real use here. Adding new branches automatically would be great, but it'll need some work from the animal herders as well as some careful design - for example, my Windows animals have separate schedules for each branch (some running more frequently than others), whilst my Solaris ones now use a runner script that cycles through the list of branches on each of a couple of animals. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 03:02 PM, Magnus Hagander wrote: If it's contentious I won't bother. We've managed OK for years, and can go on managing as we are. If we can find a good way to do it, I think having BF animals automatically picking up new branches is a very good thing to have. So don't give up so easily :D If adding a more or less random file to back branches is the only way to do it, I'm for doing that - I'd just like to find some method that feels cleaner. But maybe I'm just bikeshedding for no real use here. Another way would be for us to have a file on master with the names of the branches we care about. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 03:17 PM, Dave Page wrote: On Tue, Jun 28, 2011 at 8:02 PM, Magnus Hagandermag...@hagander.net wrote: If we can find a good way to do it, I think having BF animals automatically picking up new branches is a very good thing to have. So don't give up so easily :D If adding a more or less random file to back branches is the only way to do it, I'm for doing that - I'd just like to find some method that feels cleaner. But maybe I'm just bikeshedding for no real use here. Adding new branches automatically would be great, but it'll need some work from the animal herders as well as some careful design - for example, my Windows animals have separate schedules for each branch (some running more frequently than others), whilst my Solaris ones now use a runner script that cycles through the list of branches on each of a couple of animals. Modern buildfarm code has a wrapper builtin. So my crontab usually just looks like this: 27 * * * * cd bf ./run_branches.pl --config=nightjar.conf --run-all The buildfarm.conf has a section like this: if ($branch eq 'global') { $conf{branches_to_build} = [qw( HEAD REL9_1_STABLE REL9_0_STABLE REL8_4_STABLE REL8_3_STABLE REL8_2_STABLE)]; } What I'd like to do is to allow this to read: if ($branch eq 'global') { $conf{branches_to_build} = 'ALL'; } and have it choose the right set for you. But if you want to run some more frequently you'd still be stuck having to manage that yourself. There's actually not a lot of point in doing it that way, though. We don't build unless there have been changes on the branch, unless told otherwise, so you might as well run frequently and test all the branches - for the most part only HEAD (i.e. master) will be built because it gets far more changes than the back branches. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE
On Tue, Jun 28, 2011 at 02:11:11PM -0400, Robert Haas wrote: On Mon, Jun 27, 2011 at 10:43 PM, Noah Misch n...@leadboat.com wrote: On Mon, Jun 27, 2011 at 03:45:43PM -0400, Robert Haas wrote: On Wed, Jun 15, 2011 at 1:03 AM, Noah Misch n...@leadboat.com wrote: [patch to avoid index rebuilds] With respect to the documentation hunks, it seems to me that the first hunk might be made clearer by leaving the paragraph of which it is a part as-is, and adding another paragraph afterwards beginning with the words In addition. The added restriction elaborates on the transitivity requirement, so I wanted to keep the new language adjacent to that. That makes sense, but it reads a bit awkwardly to me. Maybe it's just that the sentence itself is so complicated that I have difficulty understanding it. I guess it's the same problem as with the text you added about hash indexes: without thinking about it, it's hard to understand what territory is covered by the new sentence that is not covered by what's already there. In the case of the hash indexes, I think I have it figured out: we already know that we must get compatible hash values out of logically equal values of different datatypes. But it's possible that the inter-type cast changes the value in some arbitrary way and then compensates for it by defining the hash function in such a way as to compensate. Similarly, for btrees, we need the relative ordering of A and B to remain the same after casting within the opfamily, not to be rearranged somehow. Maybe the text you've got is OK to explain this, but I wonder if there's a way to do it more simply. That's basically right, I think. Presently, there is no connection between casts and operator family notions of equality. For example, a cast can change the hash value. In general, that's not wrong. However, I wish to forbid it when some hash operator family covers both the source and destination types of the cast. Note that, I don't especially care whether the stored bits changed or not. I just want casts to preserve equality when an operator family defines equality across the types involved in the cast. The specific way of articulating that is probably vulnerable to improvement. It would be valuable to avoid introducing a second chunk of code that knows everything about the catalog entries behind an index. ?That's what led me to the put forward the most recent version as best. ?What do you find vile about that approach? ?I wasn't comfortable with it at first, because I suspected the checks in RelationPreserveStorage() might be important for correctness. ?Having studied it some more, though, I think they just reflect the narrower needs of its current sole user. Maybe vile is a strong word, but it seems like a modularity violation: we're basically letting DefineIndex() do some stuff we don't really want done, and then backing it out parts of it that we don't really want done after all. It seems like it would be better to provide DefineIndex with enough information not to do the wrong thing in the first place. Could we maybe pass stmt-oldNode to DefineIndex(), and let it work out the details? True. I initially shied away from that, because we assume somewhat deep into the stack that the new relation will have pg_class.oid = pg_class.relfilenode. Here's the call stack in question: RelationBuildLocalRelation heap_create index_create DefineIndex ATExecAddIndex Looking at it again, it wouldn't bring the end of the world to add a relfilenode argument to each. None of those have more than four callers. ATExecAddIndex() would then call RelationPreserveStorage() before calling DefineIndex(), which would in turn put things in a correct state from the start. Does that seem appropriate? Offhand, I do like it better than what I had. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On Tuesday, June 28, 2011, Andrew Dunstan and...@dunslane.net wrote: On 06/28/2011 03:17 PM, Dave Page wrote: On Tue, Jun 28, 2011 at 8:02 PM, Magnus Hagandermag...@hagander.net wrote: If we can find a good way to do it, I think having BF animals automatically picking up new branches is a very good thing to have. So don't give up so easily :D If adding a more or less random file to back branches is the only way to do it, I'm for doing that - I'd just like to find some method that feels cleaner. But maybe I'm just bikeshedding for no real use here. Adding new branches automatically would be great, but it'll need some work from the animal herders as well as some careful design - for example, my Windows animals have separate schedules for each branch (some running more frequently than others), whilst my Solaris ones now use a runner script that cycles through the list of branches on each of a couple of animals. Modern buildfarm code has a wrapper builtin. So my crontab usually just looks like this: 27 * * * * cd bf ./run_branches.pl --config=nightjar.conf --run-all The buildfarm.conf has a section like this: if ($branch eq 'global') { $conf{branches_to_build} = [qw( HEAD REL9_1_STABLE REL9_0_STABLE REL8_4_STABLE REL8_3_STABLE REL8_2_STABLE)]; } What I'd like to do is to allow this to read: if ($branch eq 'global') { $conf{branches_to_build} = 'ALL'; } and have it choose the right set for you. Oh, cool. Guess I'll be reconfiguring my animals soon :-) But if you want to run some more frequently you'd still be stuck having to manage that yourself. There's actually not a lot of point in doing it that way, though. We don't build unless there have been changes on the branch, unless told otherwise, so you might as well run frequently and test all the branches - for the most part only HEAD (i.e. master) will be built because it gets far more changes than the back branches. It was something Tom asked for ages ago, so he could see if the Windows build got broken more promptly. I didn't want multiple branches running with increased frequency as I run a number of animals on a single machine with vmware, and a back patched change could cause a lot of extra work. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 03:48 PM, Dave Page wrote: But if you want to run some more frequently you'd still be stuck having to manage that yourself. There's actually not a lot of point in doing it that way, though. We don't build unless there have been changes on the branch, unless told otherwise, so you might as well run frequently and test all the branches - for the most part only HEAD (i.e. master) will be built because it gets far more changes than the back branches. It was something Tom asked for ages ago, so he could see if the Windows build got broken more promptly. I didn't want multiple branches running with increased frequency as I run a number of animals on a single machine with vmware, and a back patched change could cause a lot of extra work. Oh, so maybe we need to have some sort of throttle. Probably just for non-head or non-(head-or-latest) would suffice. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Keywords in pg_hba.conf should be field-specific
Excerpts from Pavel Stehule's message of dom jun 26 13:10:13 -0400 2011: Hello I have touched next_token() and next_token_expand() a bit more, because it seemed to me that they could be simplified further (the bit about returning the comma in the token, instead of being a boolean return, seemed strange). Please let me know what you think. I am thinking, so it is ok. Thanks, I have pushed it. Brendan: thanks for the patch. I am not sure, if load_ident is correct. In load_hba you clean a previous context after successful processing. In load_ident you clean data on start. Is it ok? Yeah, they are a bit inconsistent. I am uninclined to mess with it, though. Right now it seems to me that the only way it could fail is if you hit an out-of-memory or a similar problem. And if you hit that at postmaster startup ... well ... I guess you have A Problem. If somebody wants to submit a patch to fix that, be my guest. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix leaky-view problem, part 1
Thanks for your reviewing, 2011/6/28 Noah Misch n...@leadboat.com: I took a look at this patch. It's incredibly simple, which is great, and it seems to achieve its goal. Suppose your query references two views owned by different roles. The quals of those views will have the same depth. Is there a way for information to leak from one view owner to another due to that? Even if multiple subqueries were pulled-up and qualifiers got merged, user given qualifiers shall have smaller depth value, so it will be always launched after the qualifiers originally used in the subqueries. Of course, it is another topic in the case when the view is originally defined with leaky functions. I like how you've assumed that the table owner trusts the operator class functions of indexes on his table to not leak information. That handily catches some basic and important qual pushdowns that would otherwise be lost. This makes assumptions, at a distance, about the possible scan types and how quals can be fitted to them. Specifically, this patch achieves its goals because any indexable qual is trustworthy, and any non-indexable qual cannot be pushed down further than the view's own quals. That seems to be true with current plan types, but it feels fragile. I don't have a concrete idea for improvement, though. Robert suggested another approach in http://archives.postgresql.org/message-id/aanlktimbn_6tyxreh5rc7pmizt-vjb3xgp8cuho0o...@mail.gmail.com ; might that approach avoid this hazard? The reason why we didn't adopt the idea to check privileges of underlying tables is that PostgreSQL checks privileges on executor phase, not planner phase. If we try to have a flag on pg_proc, it is a tough work to categolize trustworth functions and non-trustworh ones from beginning, because we have more than 2000 of built-in functions. So, it is reasonable assumption that index access methods does not leak contents of tuples scanned, because only superuser can define them. The part 2 patch in this series implements the planner restriction more likely to yield performance regressions, so it introduces a mechanism for identifying when to apply the restriction. This patch, however, applies its restriction unconditionally. Since we will inevitably have a such mechanism before you are done sealing the leaks in our view implementation, the restriction in this patch should also use that mechanism. Therefore, I think we should review and apply part 2 first, then update this patch to use its conditional behavior. The reason why this patch always gives the depth higher priority is the matter is relatively minor compared to the issue the part.2 patch tries to tackle. That affects the selection of scan plan (IndexScan or SeqScan), so it is significant decision to be controllable. However, this issue is just on a particular scan. In addition, implementation will become complex, if both of qualifiers pulled-up from security barrier view and qualifiers pulled-up from regular views are mixed within a single qualifier list. A few minor questions/comments on the implementation: On Mon, Jun 06, 2011 at 01:37:11PM +0100, Kohei Kaigai wrote: --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c + else if (IsA(node, Query)) + { + depth += 2; Why two? This patch is a groundwork for the upcoming row-level security feature. In the case when the backend appends security policy functions, it should be launched prior to user given qualifiers. So, I intends to give odd depth numbers for these functions to have higher priorities to other one. Of course, 1 might work well right now. --- a/src/test/regress/expected/select_views.out +++ b/src/test/regress/expected/select_views.out +EXPLAIN SELECT * FROM your_credit WHERE f_leak(number,expired); + QUERY PLAN +-- + Seq Scan on credit_cards (cost=0.00..181.20 rows=1 width=96) Use EXPLAIN (COSTS OFF) in regression tests. We do not put much effort into the stability of exact cost values, and they do not matter for the purpose of this test. OK, fixed. --- a/src/test/regress/sql/select_views.sql +++ b/src/test/regress/sql/select_views.sql @@ -8,3 +8,46 @@ SELECT * FROM street; SELECT name, #thepath FROM iexit ORDER BY 1, 2; SELECT * FROM toyemp WHERE name = 'sharon'; + +-- +-- Test for leaky-view problem +-- + +-- setups +SET client_min_messages TO 'warning'; + +DROP ROLE IF EXISTS alice; +DROP FUNCTION IF EXISTS f_leak(text); +DROP TABLE IF EXISTS credit_cards; + +RESET client_min_messages; No need for this. The regression tests always start on a clean database. Fixed. + +CREATE USER alice; +CREATE FUNCTION f_leak(text, text) + RETURNS bool LANGUAGE 'plpgsql' + AS 'begin raise notice ''% = %'', $1, $2; return true; end'; I ran this test case on master, and it did not
Re: [HACKERS] spinlock contention
On Tue, Jun 28, 2011 at 2:33 PM, Florian Pflug f...@phlo.org wrote: [ testing of various spinlock implementations ] I set T=30 and N=1 2 4 8 16 32 and tried this out on a 32-core loaner from Nate Boley: 100 counter increments per cycle worker 1 2 4 8 16 32 timewalluserwalluserwalluserwalluserwall userwalluser none2.8e-07 2.8e-07 1.5e-07 3.0e-07 8.0e-08 3.2e-07 4.2e-08 3.3e-07 2.1e-08 3.3e-07 1.1e-08 3.4e-07 atomicinc 3.6e-07 3.6e-07 2.6e-07 5.1e-07 1.4e-07 5.5e-07 1.4e-07 1.1e-06 1.5e-07 2.3e-06 1.5e-07 4.9e-06 cmpxchng3.6e-07 3.6e-07 3.4e-07 6.9e-07 3.2e-07 1.3e-06 2.9e-07 2.3e-06 4.2e-07 6.6e-06 4.5e-07 1.4e-05 spin4.1e-07 4.1e-07 2.8e-07 5.7e-07 1.6e-07 6.3e-07 1.2e-06 9.4e-06 3.8e-06 6.1e-05 1.4e-05 4.3e-04 pg_lwlock 3.8e-07 3.8e-07 2.7e-07 5.3e-07 1.5e-07 6.2e-07 3.9e-07 3.1e-06 1.6e-06 2.5e-05 6.4e-06 2.0e-04 pg_lwlock_cas 3.7e-07 3.7e-07 2.8e-07 5.6e-07 1.4e-07 5.8e-07 1.4e-07 1.1e-06 1.9e-07 3.0e-06 2.4e-07 7.5e-06 I wrote a little script to show to reorganize this data in a possibly-easier-to-understand format - ordering each column from lowest to highest, and showing each algorithm as a multiple of the cheapest value for that column: wall-1: none(1.0),atomicinc(1.3),cmpxchng(1.3),pg_lwlock_cas(1.3),pg_lwlock(1.4),spin(1.5) user-1: none(1.0),atomicinc(1.3),cmpxchng(1.3),pg_lwlock_cas(1.3),pg_lwlock(1.4),spin(1.5) wall-2: none(1.0),atomicinc(1.7),pg_lwlock(1.8),spin(1.9),pg_lwlock_cas(1.9),cmpxchng(2.3) user-2: none(1.0),atomicinc(1.7),pg_lwlock(1.8),pg_lwlock_cas(1.9),spin(1.9),cmpxchng(2.3) wall-4: none(1.0),atomicinc(1.7),pg_lwlock_cas(1.7),pg_lwlock(1.9),spin(2.0),cmpxchng(4.0) user-4: none(1.0),atomicinc(1.7),pg_lwlock_cas(1.8),pg_lwlock(1.9),spin(2.0),cmpxchng(4.1) wall-8: none(1.0),atomicinc(3.3),pg_lwlock_cas(3.3),cmpxchng(6.9),pg_lwlock(9.3),spin(28.6) user-8: none(1.0),atomicinc(3.3),pg_lwlock_cas(3.3),cmpxchng(7.0),pg_lwlock(9.4),spin(28.5) wall-16: none(1.0),atomicinc(7.1),pg_lwlock_cas(9.0),cmpxchng(20.0),pg_lwlock(76.2),spin(181.0) user-16: none(1.0),atomicinc(7.0),pg_lwlock_cas(9.1),cmpxchng(20.0),pg_lwlock(75.8),spin(184.8) wall-32: none(1.0),atomicinc(13.6),pg_lwlock_cas(21.8),cmpxchng(40.9),pg_lwlock(581.8),spin(1272.7) user-32: none(1.0),atomicinc(14.4),pg_lwlock_cas(22.1),cmpxchng(41.2),pg_lwlock(588.2),spin(1264.7) Here's the result of the same calculation applied to your second set of data: wall-1: none(1.0),atomicinc(1.0),cmpxchng(1.0),pg_lwlock(1.0),pg_lwlock_cas(1.0),spin(1.2) user-1: none(1.0),atomicinc(1.0),cmpxchng(1.0),pg_lwlock(1.0),pg_lwlock_cas(1.0),spin(1.2) wall-2: none(1.0),atomicinc(1.0),cmpxchng(1.0),pg_lwlock(1.0),spin(1.2),pg_lwlock_cas(1.2) user-2: none(1.0),cmpxchng(1.0),pg_lwlock(1.0),atomicinc(1.0),spin(1.2),pg_lwlock_cas(1.2) wall-3: none(1.0),pg_lwlock(1.0),atomicinc(1.0),cmpxchng(1.0),spin(1.3),pg_lwlock_cas(1.3) user-3: none(1.0),atomicinc(1.0),pg_lwlock(1.0),cmpxchng(1.0),spin(1.3),pg_lwlock_cas(1.3) wall-4: none(1.0),atomicinc(1.0),cmpxchng(1.2),pg_lwlock_cas(1.3),pg_lwlock(1.8),spin(5.2) user-4: none(1.0),atomicinc(1.0),cmpxchng(1.2),pg_lwlock_cas(1.3),pg_lwlock(1.8),spin(5.4) There seems to be something a bit funky in your 3-core data, but overall I read this data to indicate that 4 cores aren't really enough to see a severe problem with spinlock contention. I'm not even sure that you can see this problem on a real workload on a 16-core system. But as you add cores the problem gets rapidly worse, because the more cores you have, the more likely it is that someone else is already holding the spinlock (or has just very recently held the spinlock) that you want. And, of course, the problem multiplies itself, because once your lock acquistions start taking longer, they become a larger percentage of your execution time, which makes it proportionately more likely that you'll find someone already there when you go to grab the lock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Keywords in pg_hba.conf should be field-specific
2011/6/28 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Pavel Stehule's message of dom jun 26 13:10:13 -0400 2011: Hello I have touched next_token() and next_token_expand() a bit more, because it seemed to me that they could be simplified further (the bit about returning the comma in the token, instead of being a boolean return, seemed strange). Please let me know what you think. I am thinking, so it is ok. Thanks, I have pushed it. Brendan: thanks for the patch. I am not sure, if load_ident is correct. In load_hba you clean a previous context after successful processing. In load_ident you clean data on start. Is it ok? Yeah, they are a bit inconsistent. I am uninclined to mess with it, though. Right now it seems to me that the only way it could fail is if you hit an out-of-memory or a similar problem. And if you hit that at postmaster startup ... well ... I guess you have A Problem. there should be a format (syntax) error. If somebody breaks a pg_ident and will do a reload, then all ident mapping is lost. This is not related to topic or doesn't modify current behave, so should not be repaired now Pavel If somebody wants to submit a patch to fix that, be my guest. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
On Jun28, 2011, at 05:42 , Jeff Davis wrote: On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote: Couldn't we also do neither of these things? I mean, presumably '[1,10]'::int8range had better work. I think that if we combine this idea with Florian's PAIR suggestion here: http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org then I think we have a solution. If we add a type RANGEINPUT that is not a pseudotype, we can use that as an intermediate type that is returned by range constructors. Then, we add casts from RANGEINPUT to each range type. That would allow range(1,2)::int8range to work without changing the type system around, because range() would have the signature: range(ANYELEMENT, ANYELEMENT) - RANGEINPUT and then the cast would change it into an int8range. But we only need the one cast per range type, and we can also support all of the other kinds of constructors like: range_cc(ANYELEMENT, ANYELEMENT) - RANGEINPUT range_linf_c(ANYELEMENT) - RANGEINPUT without additional hassle. Hm, so RANGEINPUT would actually be what was previously discussed as the range as a pair of bounds definition, as opposed to the range as a set of values definition. So essentially we'd add a second concept of what a range is to work around the range input troubles. I believe if we go that route we should make RANGEINPUT a full-blown type, having pair of bound semantics. Adding a lobotomized version just for the sake of range input feels a bit like a kludge to me. Alternatively, we could replace RANGEINPUT simply with ANYARRAY, and add functions ANYRANGE-ANYRANGE which allow specifying the bound operator (, = respectively ,=) after construction. So you'd write (using the functions-as-fields syntax I believe we support) (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]' and ARRAY[NULL,2]::int8range for '[-inf,2]' All assuming that modifying the type system to support polymorphic type resolution based on the return type is out of the question... ;-) best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Keywords in pg_hba.conf should be field-specific
Excerpts from Pavel Stehule's message of mar jun 28 16:19:02 -0400 2011: there should be a format (syntax) error. If somebody breaks a pg_ident and will do a reload, then all ident mapping is lost. No, the file is not actually parsed until the auth verification runs. The incorrect tokens are happily stored in memory by the tokeniser. In my view that's the wrong way to do it, but changing it seems to be a lot of work (I didn't try). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Keywords in pg_hba.conf should be field-specific
2011/6/28 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Pavel Stehule's message of mar jun 28 16:19:02 -0400 2011: there should be a format (syntax) error. If somebody breaks a pg_ident and will do a reload, then all ident mapping is lost. No, the file is not actually parsed until the auth verification runs. The incorrect tokens are happily stored in memory by the tokeniser. In my view that's the wrong way to do it, but changing it seems to be a lot of work (I didn't try). ok -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On tis, 2011-06-28 at 15:37 -0400, Andrew Dunstan wrote: What I'd like to do is to allow this to read: if ($branch eq 'global') { $conf{branches_to_build} = 'ALL'; } and have it choose the right set for you. It seems to me that if you put a marker file into old branches, you'd still have to check out and poll the old branches, which could become somewhat expensive as the number of old branches grows. Couldn't you just put a text file on the build farm server with recommended branches? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI modularity questions
2011/6/28, Robert Haas robertmh...@gmail.com: You know, it just occurred to me while reading this email that you're using the term predicate lock in a way that is totally different from what I learned in school. What I was taught is that the word predicate in predicate lock is like the word tuple in tuple lock or the word relation in relation lock - that is, it describes *the thing being locked*. In other words, you are essentially doing: LOCK TABLE foo WHERE i = 1; I think that what you're calling the predicate lock manager should really be called the siread lock manager, and all of the places where you are predicate locking a tuple should really be siread locking the tuple. The predicate in the full table case is: any tuple in this table (including tuples that don't exist yet, otherwise it wouldn't be a predicate). The predicate in the index case is: any tuple that would be returned by so-and-such index scan (idem regarding tuples that don't exist yet, hence locking the gaps). The lock semantics (i.e., how conflicts between it and other locks are defined and treated) are siread. The thing that it applies to is a predicate. (I.e., PostgreSQL before SSI already supported some rather trivial kind of predicate lock: the full table lock.) Conclusion: I don't see the problem :-). Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 04:51 PM, Peter Eisentraut wrote: On tis, 2011-06-28 at 15:37 -0400, Andrew Dunstan wrote: What I'd like to do is to allow this to read: if ($branch eq 'global') { $conf{branches_to_build} = 'ALL'; } and have it choose the right set for you. It seems to me that if you put a marker file into old branches, you'd still have to check out and poll the old branches, which could become somewhat expensive as the number of old branches grows. No, not really. I'd use 'git ls-tree $branchname $filetolookfor. I have tested it and this works just fine, takes a second or so. Couldn't you just put a text file on the build farm server with recommended branches? As I told Magnus, that gets ugly because of limitations in MinGW's SDK perl. I suppose I could just not implement the feature for MinGW, but I've tried damn hard not to make those sorts of compromises and I'm not keen to start. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On tis, 2011-06-28 at 17:05 -0400, Andrew Dunstan wrote: Couldn't you just put a text file on the build farm server with recommended branches? As I told Magnus, that gets ugly because of limitations in MinGW's SDK perl. I suppose I could just not implement the feature for MinGW, but I've tried damn hard not to make those sorts of compromises and I'm not keen to start. The buildfarm code can upload the build result via HTTP; why can't it download a file via HTTP? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Tue, Jun 28, 2011 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote: user-32: none(1.0),atomicinc(14.4),pg_lwlock_cas(22.1),cmpxchng(41.2),pg_lwlock(588.2),spin(1264.7) I may not be following all this correctly, but doesn't this suggest a huge potential upside for the cas based patch you posted upthread when combined with your earlier patches that were bogging down on spinlock contentionl? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI modularity questions
Heikki Linnakangas wrote: On 28.06.2011 20:47, Kevin Grittner wrote: Hmm, the calls in question are the ones in heapgettup() and heapgettup_pagemode(), which are subroutines of heap_getnext(). heap_getnext() is only used in sequential scans, so it seems safe to remove those calls. I haven't found anything to the contrary, if I understand correctly, Dan found the same, and all the tests pass without them. Here's a patch to remove them. This makes the recently-added rs_relpredicatelocked boolean field unnecessary, so that's removed in this patch, too. I would like to add a test involving a lossy bitmap scan. How many rows are normally needed to force a bitmap scan to be lossy? The size of bitmaps is controlled by work_mem, so you can set work_mem very small to cause them to become lossy earlier. Off the top of my head I don't have any guesstimate on how many rows you need. What's the easiest way to check whether a plan is going to use (or is using) a lossy bitmap scan? Good question. There doesn't seem to be anything in the EXPLAIN ANALYZE output to show that, so I think you'll have to resort to adding some elog()s in the right places. OK, thanks. -Kevin ssi-seqscan-cleanup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Tue, Jun 28, 2011 at 5:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jun 28, 2011 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote: user-32: none(1.0),atomicinc(14.4),pg_lwlock_cas(22.1),cmpxchng(41.2),pg_lwlock(588.2),spin(1264.7) I may not be following all this correctly, but doesn't this suggest a huge potential upside for the cas based patch you posted upthread when combined with your earlier patches that were bogging down on spinlock contentionl? Well, you'd think so, but in fact that patch makes it slower. Don't ask me why, 'cuz I dunno. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] marking old branches as no longer maintained
On 06/28/2011 05:31 PM, Peter Eisentraut wrote: On tis, 2011-06-28 at 17:05 -0400, Andrew Dunstan wrote: Couldn't you just put a text file on the build farm server with recommended branches? As I told Magnus, that gets ugly because of limitations in MinGW's SDK perl. I suppose I could just not implement the feature for MinGW, but I've tried damn hard not to make those sorts of compromises and I'm not keen to start. The buildfarm code can upload the build result via HTTP; why can't it download a file via HTTP? It has to use a separate script to do that. I don't really want to add another one just for this. (thinks a bit) I suppose I can make it do: my $url = http://buildfarm.postgresql.org/branches_of_interest.txt;; my $branches_of_interest = `perl -MLWP::Simple -e getprint(q{$url})`; Maybe that's the best option. It's certainly going to be less code than anything else :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Jun28, 2011, at 23:48 , Robert Haas wrote: On Tue, Jun 28, 2011 at 5:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jun 28, 2011 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote: user-32: none(1.0),atomicinc(14.4),pg_lwlock_cas(22.1),cmpxchng(41.2),pg_lwlock(588.2),spin(1264.7) I may not be following all this correctly, but doesn't this suggest a huge potential upside for the cas based patch you posted upthread when combined with your earlier patches that were bogging down on spinlock contentionl? Well, you'd think so, but in fact that patch makes it slower. Don't ask me why, 'cuz I dunno. :-( Huh? Where do you see your CAS patch being slower than unpatched LWLocks in these results? Or are you referring to pgbench runs you made, not to these artificial benchmarks? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Tue, Jun 28, 2011 at 5:55 PM, Florian Pflug f...@phlo.org wrote: On Jun28, 2011, at 23:48 , Robert Haas wrote: On Tue, Jun 28, 2011 at 5:33 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jun 28, 2011 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote: user-32: none(1.0),atomicinc(14.4),pg_lwlock_cas(22.1),cmpxchng(41.2),pg_lwlock(588.2),spin(1264.7) I may not be following all this correctly, but doesn't this suggest a huge potential upside for the cas based patch you posted upthread when combined with your earlier patches that were bogging down on spinlock contentionl? Well, you'd think so, but in fact that patch makes it slower. Don't ask me why, 'cuz I dunno. :-( Huh? Where do you see your CAS patch being slower than unpatched LWLocks in these results? Or are you referring to pgbench runs you made, not to these artificial benchmarks? pgbench -S -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix leaky-view problem, part 1
On Tue, Jun 28, 2011 at 10:11:59PM +0200, Kohei KaiGai wrote: 2011/6/28 Noah Misch n...@leadboat.com: Suppose your query references two views owned by different roles. ?The quals of those views will have the same depth. ?Is there a way for information to leak from one view owner to another due to that? Even if multiple subqueries were pulled-up and qualifiers got merged, user given qualifiers shall have smaller depth value, so it will be always launched after the qualifiers originally used in the subqueries. Of course, it is another topic in the case when the view is originally defined with leaky functions. Right. I was thinking of a pair of quals, one in each of two view definitions. The views are mutually-untrusting. Something like this: CREATE VIEW a AS SELECT * FROM ta WHERE ac = 5; ALTER VIEW a OWNER TO alice; CREATE VIEW b AS SELECT * FROM tb WHERE bc = 6; ALTER VIEW b OWNER TO bob; SELECT * FROM a, b; Both the ac=5 and the bc=6 quals do run at the same depth despite enforcing security for different principals. I can't think of a way that one view owner could use this situation to subvert the security of the other view owner, but I wanted to throw it out. This makes assumptions, at a distance, about the possible scan types and how quals can be fitted to them. ?Specifically, this patch achieves its goals because any indexable qual is trustworthy, and any non-indexable qual cannot be pushed down further than the view's own quals. ?That seems to be true with current plan types, but it feels fragile. ?I don't have a concrete idea for improvement, though. ?Robert suggested another approach in http://archives.postgresql.org/message-id/aanlktimbn_6tyxreh5rc7pmizt-vjb3xgp8cuho0o...@mail.gmail.com ; might that approach avoid this hazard? The reason why we didn't adopt the idea to check privileges of underlying tables is that PostgreSQL checks privileges on executor phase, not planner phase. If we try to have a flag on pg_proc, it is a tough work to categolize trustworth functions and non-trustworh ones from beginning, because we have more than 2000 of built-in functions. So, it is reasonable assumption that index access methods does not leak contents of tuples scanned, because only superuser can define them. I was referring to this paragraph: On the technical side, I am pretty doubtful that the approach of adding a nestlevel to FuncExpr and RelOptInfo is the right way to go. I believe we have existing code (to handle left joins) that prevents quals from being pushed down too far by fudging the set of relations that are supposedly needed to evaluate the qual. I suspect a similar approach would work here. The part 2 patch in this series implements the planner restriction more likely to yield performance regressions, so it introduces a mechanism for identifying when to apply the restriction. ?This patch, however, applies its restriction unconditionally. ?Since we will inevitably have a such mechanism before you are done sealing the leaks in our view implementation, the restriction in this patch should also use that mechanism. ?Therefore, I think we should review and apply part 2 first, then update this patch to use its conditional behavior. The reason why this patch always gives the depth higher priority is the matter is relatively minor compared to the issue the part.2 patch tries to tackle. That affects the selection of scan plan (IndexScan or SeqScan), so it is significant decision to be controllable. However, this issue is just on a particular scan. True. The lost optimization opportunity is relatively minor, but perhaps not absolutely minor. It would be one thing if we could otherwise get away without designing any mechanism for applying these restrictions conditionally. However, since you have implemented the conditional behavior elsewhere, it would be nice to apply it here. In addition, implementation will become complex, if both of qualifiers pulled-up from security barrier view and qualifiers pulled-up from regular views are mixed within a single qualifier list. I only scanned the part 2 patch, but isn't the bookkeeping already happening for its purposes? How much more complexity would we get to apply the same strategy to the behavior of this patch? On Mon, Jun 06, 2011 at 01:37:11PM +0100, Kohei Kaigai wrote: --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c + ? ? else if (IsA(node, Query)) + ? ? { + ? ? ? ? ? ? depth += 2; Why two? This patch is a groundwork for the upcoming row-level security feature. In the case when the backend appends security policy functions, it should be launched prior to user given qualifiers. So, I intends to give odd depth numbers for these functions to have higher priorities to other one. Of course, 1 might work well right now. I'd say it should either be 1 until such time as that's needed,
Re: [HACKERS] spinlock contention
On Jun28, 2011, at 22:18 , Robert Haas wrote: On Tue, Jun 28, 2011 at 2:33 PM, Florian Pflug f...@phlo.org wrote: [ testing of various spinlock implementations ] I set T=30 and N=1 2 4 8 16 32 and tried this out on a 32-core loaner from Nate Boley: Cool, thanks! 100 counter increments per cycle worker1 2 4 8 16 32 time walluserwalluserwalluserwalluserwall userwalluser none 2.8e-07 2.8e-07 1.5e-07 3.0e-07 8.0e-08 3.2e-07 4.2e-08 3.3e-07 2.1e-08 3.3e-07 1.1e-08 3.4e-07 atomicinc 3.6e-07 3.6e-07 2.6e-07 5.1e-07 1.4e-07 5.5e-07 1.4e-07 1.1e-06 1.5e-07 2.3e-06 1.5e-07 4.9e-06 cmpxchng 3.6e-07 3.6e-07 3.4e-07 6.9e-07 3.2e-07 1.3e-06 2.9e-07 2.3e-06 4.2e-07 6.6e-06 4.5e-07 1.4e-05 spin 4.1e-07 4.1e-07 2.8e-07 5.7e-07 1.6e-07 6.3e-07 1.2e-06 9.4e-06 3.8e-06 6.1e-05 1.4e-05 4.3e-04 pg_lwlock 3.8e-07 3.8e-07 2.7e-07 5.3e-07 1.5e-07 6.2e-07 3.9e-07 3.1e-06 1.6e-06 2.5e-05 6.4e-06 2.0e-04 pg_lwlock_cas 3.7e-07 3.7e-07 2.8e-07 5.6e-07 1.4e-07 5.8e-07 1.4e-07 1.1e-06 1.9e-07 3.0e-06 2.4e-07 7.5e-06 Here's the same table, formatted with spaces. worker 1 2 4 8 16 32 timewalluserwalluserwalluserwalluser walluserwalluser none2.8e-07 2.8e-07 1.5e-07 3.0e-07 8.0e-08 3.2e-07 4.2e-08 3.3e-07 2.1e-08 3.3e-07 1.1e-08 3.4e-07 atomicinc 3.6e-07 3.6e-07 2.6e-07 5.1e-07 1.4e-07 5.5e-07 1.4e-07 1.1e-06 1.5e-07 2.3e-06 1.5e-07 4.9e-06 cmpxchng3.6e-07 3.6e-07 3.4e-07 6.9e-07 3.2e-07 1.3e-06 2.9e-07 2.3e-06 4.2e-07 6.6e-06 4.5e-07 1.4e-05 spin4.1e-07 4.1e-07 2.8e-07 5.7e-07 1.6e-07 6.3e-07 1.2e-06 9.4e-06 3.8e-06 6.1e-05 1.4e-05 4.3e-04 pg_lwlock 3.8e-07 3.8e-07 2.7e-07 5.3e-07 1.5e-07 6.2e-07 3.9e-07 3.1e-06 1.6e-06 2.5e-05 6.4e-06 2.0e-04 pg_lwlock_cas 3.7e-07 3.7e-07 2.8e-07 5.6e-07 1.4e-07 5.8e-07 1.4e-07 1.1e-06 1.9e-07 3.0e-06 2.4e-07 7.5e-06 And here's the throughput table calculated from your results, i.e. the wall time per cycle relative to the wall time per cycle for 1 worker. workers 2 4 8 16 32 none1.9 3.5 6.7 13 26 atomicinc 1.4 2.6 2.6 2.4 2.4 cmpxchng1.1 1.1 1.2 0.9 0.8 spin1.5 2.6 0.3 0.1 0.0 pg_lwlock 1.4 2.5 1.0 0.2 0.1 pg_lwlock_cas 1.3 2.6 2.6 1.9 1.5 Hm, so in the best case we get 2.6x the throughput of a single core, and that only for 4 and 8 workers (1.4e-7 second / cycle vs 3.6e-7). In that case, there also seems to be little difference between pg_lwlock{_cas} and atomicinc. atomicinc again manages to at least sustain that throughput when the worker count is increased, while for for the others the throughput actually *decreases*. What totally puzzles me is that your results don't show any trace of a decreased system load for the pg_lwlock implementation, which I'd have expected due to the sleep() in the contested path. Here are the user vs. wall time ratios - I'd have expected to see value significantly below the number of workers for pg_lwlock none 1.0 2.0 4.0 7.9 16 31 atomicinc 1.0 2.0 3.9 7.9 15 33 cmpxchng 1.0 2.0 4.1 7.9 16 31 spin 1.0 2.0 3.9 7.8 16 31 pg_lwlock 1.0 2.0 4.1 7.9 16 31 pg_lwlock_cas 1.0 2.0 4.1 7.9 16 31 I wrote a little script to show to reorganize this data in a possibly-easier-to-understand format - ordering each column from lowest to highest, and showing each algorithm as a multiple of the cheapest value for that column: If you're OK with that, I'd like to add that to the lockbench repo. There seems to be something a bit funky in your 3-core data, but overall I read this data to indicate that 4 cores aren't really enough to see a severe problem with spinlock contention. Hm, it starts to show if you lower the counter increment per cycle (the D constant in run.sh). But yeah, it's never as bad as the 32-core results above.. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade defaulting to port 25432
Peter Eisentraut wrote: On m?n, 2011-06-27 at 14:34 -0400, Bruce Momjian wrote: Bruce Momjian wrote: Robert Haas wrote: It's easier to read the patches if you do separate changes in separate patches. Anyway, I'm a bit nervous about this hunk: + if (old_cluster.port == DEF_PGUPORT) + pg_log(PG_FATAL, When checking a live old server, + you must specify the old server's port number.\n); Is the implication here that I'm now going to need to specify more than 4 command-line options/environment variables for this to work? Yes, we don't inherit PGPORT anymore. Doing anything else was too complex to explain in the docs. But only if you are running --check on a live server. Otherwise, we will just default to 50432 instead of 5432/PGPORT. When checking a live server, the built-in default port number or the value of the environment variable PGPORT is used. But when performing an upgrade, a different port number is used by default, namely 50432, which can be overridden XXX [how?] Seems pretty clear to me, as long as that last bit is figured out. Not sure where you got that text --- I assume that was an old email. I decided it was too complex to have PGPORT be honoroed only if there is a live server, so I just always default to 50432 for both servers, and I added this error check: if (user_opts.check is_server_running(old_cluster.pgdata)) { *live_check = true; + if (old_cluster.port == DEF_PGUPORT) + pg_log(PG_FATAL, When checking a live old server, + you must specify the old server's port number.\n); OK? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlock contention
On Tue, Jun 28, 2011 at 8:11 PM, Florian Pflug f...@phlo.org wrote: I wrote a little script to show to reorganize this data in a possibly-easier-to-understand format - ordering each column from lowest to highest, and showing each algorithm as a multiple of the cheapest value for that column: If you're OK with that, I'd like to add that to the lockbench repo. It's a piece of crap, but you're welcome to try to fix it up. See attached. With respect to the apparent lack of impact of the sleep loop, I can only speculate that this test case doesn't really mimic real-world conditions inside the backend very well. There is a lot more going on there - multiple locks, lots of memory access, etc. But I don't really understand it either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company sortby Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: After we run pg_ctl promote, time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. PFA a patch that I believe will disable recovery_time_delay after promotion. The only change from the previous version is: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog index 1dbf792..41b3ae9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5869,7 +5869,7 @@ pg_is_xlog_replay_paused(PG_FUNCTION_ARGS) static void recoveryDelay(void) { - while (1) + while (!CheckForStandbyTrigger()) { longsecs; int microsecs; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company time-delayed-standby-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-06-28 01:52 AM, Jun Ishiduka wrote: Considering everything that has been discussed on this thread so far. Do you still think your patch is the best way to accomplish base backups from standby servers? If not what changes do you think should be made? I reconsider the way to not use pg_stop_backup(). Process of online base backup on standby server: 1. pg_start_backup('x'); 2. copy the data directory 3. copy *pg_control* Behavior while restore: * read Minimum recovery ending location of the copied pg_control. * use the value with the same purposes as the end-of-backup location. - When the value is equal to 0/0, this behavior do not do. This situation is to acquire backup from master server. The behaviour you describe above sounds okay to me, if someone else sees issues with this then they should speak up (ideally before you go off and write a new patch) I'm going to consolidate my other comments below so this can act as a more complete review. Usability Review - We would like to be able to perform base backups from slaves without having to call pg_start_backup() on the master. We can not currently do this. The patch tries to do this. From a useability point of view it would be nice if this could be done both manually with pg_start_backup() and with pg_basebackup. The main issue I have with the first patch you submitted is that it does not work for cases where you don't want to call pg_basebackup or you don't want to include the wal segments in the output of pg_basebackup. There are a number of these use-cases (examples include the wal is already available on an archive server, or you want to use filesystem/disk array level snapshots instead of tar) . I feel your above proposal to copy the control file as the last step in the basebackup and the get the minRecoveryEnding location from this solves these issues. It would be nice if things 'worked' when calling pg_basebackup against the slave (maybe by having perform_base_backup() resend the control file after it has sent the log?). Feature test Performance review - Skipped since a new patch is coming Coding Review -- I didn't look too closely at the code since a new patch that might change a lot of the code. I did like how you added comments to most of the larger code blocks that you added. Architecture Review --- There were some concerns with your original approach but using the control file was suggested by Heikki and seems sound to me. I'm marking this 'waiting for author' , if you don't think you'll be able to get a reworked patch out during this commitfest then you should move it to 'returned with feedback' Steve Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types, constructors, and the type system
On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote: Hm, so RANGEINPUT would actually be what was previously discussed as the range as a pair of bounds definition, as opposed to the range as a set of values definition. So essentially we'd add a second concept of what a range is to work around the range input troubles. I believe if we go that route we should make RANGEINPUT a full-blown type, having pair of bound semantics. Adding a lobotomized version just for the sake of range input feels a bit like a kludge to me. I think David Wheeler was trying to make a similar point, but I'm still not convinced. It's not a pair, because it can be made up of 0, 1, or 2 scalar values (unless you count infinity as one of those values, in which case 0 or 2). And without ordering, it's not clear that those values are really bounds. The type needs to: * represent two values, either of which might be a special infinite value * represent the value empty * represent inclusivity/exclusivity of both values and those things seem fairly specific to ranges, so I don't really see what other use we'd have for such a type. But I'm open to suggestion. I don't think that having an extra type around is so bad. It solves a lot of problems, and doesn't seem like it would get in the way. And it's only for the construction of ranges out of scalars, which seems like the most natural place where a cast might be required (similar to casting an unknown literal, which is fairly common). Alternatively, we could replace RANGEINPUT simply with ANYARRAY, and add functions ANYRANGE-ANYRANGE which allow specifying the bound operator (, = respectively ,=) after construction. So you'd write (using the functions-as-fields syntax I believe we support) (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]' and ARRAY[NULL,2]::int8range for '[-inf,2]' I think we can rule this one out: * The functions-as-fields syntax is all but deprecated (or should be) * That's hardly a readability improvement * It still suffers similar problems as casting back and forth to text: ANYARRAY is too general, doesn't really take advantage of the type system, and not a great fit anyway. All assuming that modifying the type system to support polymorphic type resolution based on the return type is out of the question... ;-) It's still not out of the question, but I thought that the intermediate type would be a less-intrusive alternative (and Robert seemed concerned about how intrusive it was). There also might be a little more effort educating users if we selected the function based on the return type, because they might think that casting the inputs explicitly would be enough to get it to pick the right function. If it were a new syntax like RANGE[]::int8range, then I think it would be easier to understand. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process local hint bit cache
On Thu, Apr 7, 2011 at 2:49 PM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Apr 7, 2011 at 1:28 PM, Merlin Moncure mmonc...@gmail.com wrote: int ByteOffset = xid / BITS_PER_BYTE; whoops, I just notice this was wrong -- the byte offset needs to be taking bucket into account. I need to clean this up some more obviously, but the issues at play remain the same So, where is the latest version of this patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Inconsistency between postgresql.conf and docs
All, A tester correctly reported this: But in the sample file, the synchronous_standby_names parameter is the first parameter under the heading - Streaming Replication - Server Settings while in the documentation, that parameter has its own subsection 18.5.5 after the streaming replication section 18.5.4. Since the rest of section 18.5.4 was more than a screenful in my browser, at first glance i didn't spot 18.5.5 and was confused. He is correct. So, my question is, should the docs change, or should postgresql.conf.sample change? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers