Re: [HACKERS] proposal: auxiliary functions for record type
On Dec12, 2010, at 00:19 , Pavel Stehule wrote: I prefer a table based solution, because I don't need a one unnest, but other preferences are valid too. That's fine with me. I dissatisfied with your design of explicit target type via unused value. I think, so we are not a infrastructure for it now - from my view is better to use a common type, that is text now. It's nothing new - plpgsql use it too. Sorry, I can't follow you here. Where does plpgsql use text as common type? I see one well design of explicit target type based on polymorphic types that respect a PostgreSQL fmgr practice: We have to allow a polymorphic functions without polymorphic parameters. These functions shoud be designed to return value in unknown type format when this function has not outer information. I don't think unknown is the right type for that. As far as I known, unknown is still a textual type, used to have some type to assign to string literals during parsing when no better type can be inferred. This information can be passed in function context. When function context isn't null, then function has to read target type and should to return value in target type. Who can fill a function context? It is task for executor. And when CAST contains just function call, then we can recheck, if function is polymorphic, and if it is, then we can set function context to target type, and then we don't need to call a conversion function, because polymorphic function must returns data in correct format. The main difficulty is that currently types are assigned in a bottom-up fashion as far as I know. To make functions with a polymorphic return value, but without polymorphic arguments work, you need to assign the return type in a top-down fashion (It depends on where to value *goes*, not where it *comes from*). That seems like a rather huge change and has the potential to complicate quite a few other parts, most notably function lookup/resolution. Plus, the general case where type information must bubble up more than one level seems pretty much intractable, as it'd require a full-blown type inference algorithm like ML or Haskell. Not a place where we want to go, I believe. The restricted case, on the other hand, brings very little benefit compared to the dummy-parameter approach. Yeah, polymorphic function()::type may look a bit cleaner than polymorphic function(NULL::type), but thats about is. It's only assignments in pl/pgsql which really benefit, since you'd be able to leave out the type completely, writing simply v_value := polymorphic_function(). Does that really warrant the effort that'd be involved? Without described functionality we can design a not polymorphic function, that can returns unknown type. When similar functionality will be implemented, then this function will be changed to polymorphic, but from user's perspective, there isn't a change. I don't really understand why you resist the idea of a dummy parameter so much. It might not be pretty, but is it bad enough to rectify putting in all this work? Plus, the whole record-manipulation stuff isn't going to win a beauty contest anytime soon. But it's better than nothing, so as long as it's reasonably efficient I think one can live with a few warts on the API. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE
Hi I've just ran into a problem while upgrading from 8.4 to 9.0. pg_upgrade aborted during the step Adding support functions to new cluster with ERROR: permission denied for language c error. Unfortunately, the log didn't include the name of the database where the error occurred, so it took me a while to figure out that the culprit was a ALTER DATABASE SET ROLE = non-superuser I had done on one of my databases, which effectively prevented pg_upgrade from connection with superuser privileges. While one could argue that this behaviour is perfectly consistent, I believe most users will expect pg_upgrade (and to a lesser extent also pg_dump and pg_restore) to be unaffected by such settings. Should we provide a way (for super-users only, of course) to skip per-database/per-role settings when connecting? best regards Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
On Sat, 2010-12-11 at 22:03 +0100, Heikki Linnakangas wrote: (Moving to pgsql-hackers) On 10.12.2010 20:21, Tom Lane wrote: Simon Riggssi...@2ndquadrant.com writes: Reduce spurious Hot Standby conflicts from never-visible records. Hot Standby conflicts only with tuples that were visible at some point. So ignore tuples from aborted transactions or for tuples updated/deleted during the inserting transaction when generating the conflict transaction ids. Following detailed analysis and test case by Noah Misch. Original report covered btree delete records, correctly observed by Heikki Linnakangas that this applies to other cases also. Fix covers all sources of cleanup records via common code. Includes additional fix compared to commit on HEAD ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken, in that it's examining xmax without having checked that xmax is (a) valid or (b) a lock rather than a deletion xmax. In current use, it's only called for tuples that are known to be dead, so either xmax is a valid deletion, or xmin didn't commit in which case the function doesn't use xmax for anything. So I think it actually works as it is. Well, I think you're both right. The function shouldn't be called in places where xmax is the wrong flavour, but there should be specific safeguards in case of mistake. I agree it doesn't look right, though. At the very least it needs comments explaining that, but preferably it should do something sane when faced with a tuple that's not dead after all. Perhaps throw an error (though that would be bad during recovery), or an Assert, or just refrain from advancing latestRemovedXid (or advance it, that would be the conservative stance given the current use). Yes Also, I'm not totally convinced it's correct when xmin xmax, despite Simon's follow-up commit to fix that. Shouldn't it advance latestRemovedXid to xmin in that case? Or maybe it's ok as it is because we know that xmax committed after xmin. The impression I get from the comment above the function now is that it advances latestRemovedXid to the highest XID present in the tuple, but that's not what it does in the xmin xmax case. That comment needs clarification. Hmmm, my earlier code took xmax only if xmax xmin. That was wrong; what I have now is better, but your point is there may be an even better truth. I'll think on that a little more. While we're at it, perhaps it would be better to move this function to tqual.c. And I feel that a more natural interface would be something like: TransactionId HeapTupleHeaderGetLatestRemovedXid(HeapTupleHeader tuple); IOW, instead bumping up the passed-in latestRemovedXid value, return the highest XID on the tuple (if it was dead). PS. it would be good to set hint bits in that function like in HeapTupleSatisfies* functions. I'm not that happy with refactoring inside a release, plus I'm not even sure if that is the right way. I suspect the best way would be to do this as a side-effect of HeapSatisfiesVacuum(), since this processing should only ever be done in conjunction with that function. Will respond later today on those thoughts. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wildcard search support for pg_trgm
I found another problem. GIN index suffers from GIN indexes do not support whole-index scans when no trigram can be extracted from pattern. With best regards, Alexander Korotkov.
Re: [HACKERS] proposal: auxiliary functions for record type
2010/12/12 Florian Pflug f...@phlo.org: On Dec12, 2010, at 00:19 , Pavel Stehule wrote: I prefer a table based solution, because I don't need a one unnest, but other preferences are valid too. That's fine with me. I dissatisfied with your design of explicit target type via unused value. I think, so we are not a infrastructure for it now - from my view is better to use a common type, that is text now. It's nothing new - plpgsql use it too. Sorry, I can't follow you here. Where does plpgsql use text as common type? plpgsql uses only IO casts. So inside assign statement is checked target type and real type. But this checking is late! I did a patch for early conversion to target type (in plan), but this patch was rejected. So actually, there isn't available information about target type in expression - and probably will be - from compatibility reasons. For example: when target variable is int, but you used a numeric constant, then any assignment does a IO cast from num to int. I see one well design of explicit target type based on polymorphic types that respect a PostgreSQL fmgr practice: We have to allow a polymorphic functions without polymorphic parameters. These functions shoud be designed to return value in unknown type format when this function has not outer information. I don't think unknown is the right type for that. As far as I known, unknown is still a textual type, used to have some type to assign to string literals during parsing when no better type can be inferred. This information can be passed in function context. When function context isn't null, then function has to read target type and should to return value in target type. Who can fill a function context? It is task for executor. And when CAST contains just function call, then we can recheck, if function is polymorphic, and if it is, then we can set function context to target type, and then we don't need to call a conversion function, because polymorphic function must returns data in correct format. The main difficulty is that currently types are assigned in a bottom-up fashion as far as I know. To make functions with a polymorphic return value, but without polymorphic arguments work, you need to assign the return type in a top-down fashion (It depends on where to value *goes*, not where it *comes from*). That seems like a rather huge change and has the potential to complicate quite a few other parts, most notably function lookup/resolution. I don't think: a) the place where we don't know a target type is limited only to first outer cast b) I didn't defined polymorphic function without polymorphic parameters (PFWPP) as absolutly undescribed - it returns a unknown or text in default. There isn't problem to search a this function - and isn't a problem for later work, so this function returns text, because first outer cast ensure transformation to correct type. c) when function is called without outer cast then it runs too - but there will be one IO cast more. some alchemy with function descriptor ale used now too - when default parameters are used. Plus, the general case where type information must bubble up more than one level seems pretty much intractable, as it'd require a full-blown type inference algorithm like ML or Haskell. Not a place where we want to go, I believe. The restricted case, on the other hand, brings very little benefit compared to the dummy-parameter approach. Yeah, polymorphic function()::type may look a bit cleaner than polymorphic function(NULL::type), but thats about is. It's only assignments in pl/pgsql which really benefit, since you'd be able to leave out the type completely, writing simply v_value := polymorphic_function(). Does that really warrant the effort that'd be involved? Without described functionality we can design a not polymorphic function, that can returns unknown type. When similar functionality will be implemented, then this function will be changed to polymorphic, but from user's perspective, there isn't a change. I don't really understand why you resist the idea of a dummy parameter so much. It might not be pretty, but is it bad enough to rectify putting in all this work? Plus, the whole record-manipulation stuff isn't going to win a beauty contest anytime soon. But it's better than nothing, so as long as it's reasonably efficient I think one can live with a few warts on the API. I wrote it. In this case, you don't need to know a value, you have to work with type. So using a typed null isn't intuitive and it isn't nice - for me - too ugly for in general module. I know, so PFWPP functions need a lot of coding without sure result, and it's reason, why I didn't used it and why I use a text type. And I have a other reason for - I expect so there is bigger probability to iterate over different type's fields, so coercion to one target type isn't available in one path. Using a more path (like you are showed in code) is
Re: [HACKERS] proposal : cross-column stats
On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote: Hi everyone, one of the ssesion I've attended on PgDay last week was Heikki's session about statistics in PostgreSQL. One of the issues he mentioned (and one I regularly run into) is the absence of cross-column stats. When the columns are not independent, this usually result in poor estimates (and then in suboptimal plans). Very cool that you're working on this. Lets talk about one special case - I'll explain how the proposed solution works, and then I'll explain how to make it more general, what improvements are possible, what issues are there. Anyway this is by no means a perfect or complete solution - it's just a starting point. It looks like you handled most of the issues. Just a few points: - This is obviously applicable to more than just integers, probably anything with a b-tree operator class. What you've coded seems rely on calculations on the values. Have you thought about how it could work for, for example, strings? The classic failure case has always been: postcodes and city names. Strongly correlated, but in a way that the computer can't easily see. Not that I suggest you fix this, but it's food for though. Though strictly speaking this is a different kind of correlation than what you're looking at. 2) I really don't think we should collect stats for all combinations of columns of a table - I do like the Oracle-like approach where a DBA has to enable cross-column stats using an ALTER TABLE (for a particular list of columns). The only exception might be columns from a multi-column index. It might be quite efficient I guess? In the past it has been suggested to only do it for multi-column indexes, but I find these days I find in some situations I prefer to make individual indexes and let the bitmap scan code combine them. So perhaps it would be best to let it be configured by the DBA. 3) There are independence tests for contingency tables (e.g. Pearson's Chi-squared test), so that it's easy to find out whether the columns are independent. In that case we can just throw away these stats and use the simple estimation. http://mathworld.wolfram.com/Chi-SquaredTest.html I think this would be good to include, if possible. Actually, I wonder if the existing stats collection code could be altered to attempt to calculate the correlation between columns as part of its other work. 4) Or we could store just those cells where expected and observed values differ significantly (may help if most of the values are indendent, but there's a small glitch somewhere). Comrpessing that grid would be useful, given that for many dimensions most of the grid will be not interesting. In fact, storing the 20 largest values may be enough. Worth an experiment. Hope this helps, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
Re: [HACKERS] Extensions, patch v16
Hi there, it's clear we need versions, probably, major.minor would be enough. The problem I see is how to keep .so in sync with .sql ? Should we store .sql in database ? Also, we need permissions for extension, since we have open/closed extensions. Oleg On Sat, 11 Dec 2010, David E. Wheeler wrote: On Dec 11, 2010, at 1:09 PM, David Fetter wrote: Why is it in the makefile at all? If the makefile does need to know it, why don't we have it scrape the number out of the control file? Or even more to the point, since when do we need version numbers in extensions? We *absolutely* need version numbers in extensions. People will want to have a certain version, or a certain minimum version, etc., etc., etc., just as they do for any other software. Seriously, are you OK? One of the biggest mistakes in the creation of CPAN was allowing modules without extensions. It makes figuring out what to upgrade extremely difficult. Learning from that, PGXN requires version numbers for all extensions. Best, David Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On 12.12.2010 15:17, Martijn van Oosterhout wrote: On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote: Very cool that you're working on this. +1 Lets talk about one special case - I'll explain how the proposed solution works, and then I'll explain how to make it more general, what improvements are possible, what issues are there. Anyway this is by no means a perfect or complete solution - it's just a starting point. It looks like you handled most of the issues. Just a few points: - This is obviously applicable to more than just integers, probably anything with a b-tree operator class. What you've coded seems rely on calculations on the values. Have you thought about how it could work for, for example, strings? The classic failure case has always been: postcodes and city names. Strongly correlated, but in a way that the computer can't easily see. Yeah, and that's actually analogous to the example I used in my presentation. The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ? AND city = ? should be the selectivity of postcode = ? alone. The measurement we need is implicativeness: How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample. That's assuming the combination has any matches. It's possible that the user chooses a postcode and city combination that doesn't exist, but that's no different from a user doing city = 'fsdfsdfsd' on a single column, returning no matches. We should assume that the combination makes sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] function attributes
Andrew Dunstan andrew.duns...@pgexperts.com writes: Yesterday I did a bit of work on allowing bytea values to be passed into and out of plperl in binary format, effectively removing the need to escape and de-escape them. (The work can be seen on he plperlargs branch of my development repo at https://github.com/adunstan/postgresql-dev/commits/plperlargs/). At the moment the behaviour is triggered by a custom setting (plperl.pass_binary_bytea), but this isn't really satisfactory. We could turn it on permanently, but that would break a lot of legacy code. What we really need is a way of marking a function with some attributes. Of course, we could put it in the program text like plpgsql's #variable_conflict, but that's really rather ugly. The grammar already has an attribute mechanism for functions, and ISTM we just need to extend that a bit to allow setting of function attributes reasonably flexibly, much as we can now specify format options on EXPLAIN or we'll soon be able to specify options for foreign tables. I do not want to go there. What you're proposing will soon turn into a real mess, with arbitrary language-specific junk tagged onto pg_proc entries. And what's worse, it'll be mixed with non-language-specific junk, because of the existing legacy WITH entries. Tim Bunce seemed to think that this particular problem might be solvable in a completely transparent way, by having byteas convert into Perl objects that have a hook for producing a backwards-compatible text translation. Have you looked into that idea? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
Hi! Dne 12.12.2010 15:17, Martijn van Oosterhout napsal(a): Lets talk about one special case - I'll explain how the proposed solution works, and then I'll explain how to make it more general, what improvements are possible, what issues are there. Anyway this is by no means a perfect or complete solution - it's just a starting point. It looks like you handled most of the issues. Just a few points: - This is obviously applicable to more than just integers, probably anything with a b-tree operator class. What you've coded seems rely on calculations on the values. Have you thought about how it could work for, for example, strings? Yes, I know, I just forgot to address this in my previous e-mail. The contingency tables have a really nice feature - they are based on splitting the sets into groups (~ bins of the histograms for each column). And this can be done if you can sort the values, you really don't need any calculations. So it should work with strings. And another thing I somehow forgot is handling the case when there is no histogram, just MCV. That's mostly the same - each of the values might be a separate group, or the values might be grouped to form less groups, etc. The classic failure case has always been: postcodes and city names. Strongly correlated, but in a way that the computer can't easily see. Not that I suggest you fix this, but it's food for though. Though strictly speaking this is a different kind of correlation than what you're looking at. Hmmm, I see. I think the proposal does not fix this particular case, although it might improve the situation a little bit (limit the error between expected and observed number of rows). The problem is that once we get to a cell-level of the contingency table, there is no additional (more detailed) information. So we're stuck with the multiplication estimate, or something like that. I was thinking about it actually, and I think we could collect some more info - a correlation coefficient for each bin, or something like that. But that was not part of my proposal, and I'm not sure how to do that. 2) I really don't think we should collect stats for all combinations of columns of a table - I do like the Oracle-like approach where a DBA has to enable cross-column stats using an ALTER TABLE (for a particular list of columns). The only exception might be columns from a multi-column index. It might be quite efficient I guess? In the past it has been suggested to only do it for multi-column indexes, but I find these days I find in some situations I prefer to make individual indexes and let the bitmap scan code combine them. So perhaps it would be best to let it be configured by the DBA. Yes, I prefer individual indexes too. The idea behind collecting cross-column stats for multi-column indexes was that maybe we could 'append' this to the current functionality (building the index or something like that) so that it does not introduce significant performance problems. 3) There are independence tests for contingency tables (e.g. Pearson's Chi-squared test), so that it's easy to find out whether the columns are independent. In that case we can just throw away these stats and use the simple estimation. http://mathworld.wolfram.com/Chi-SquaredTest.html I think this would be good to include, if possible. Actually, I wonder if the existing stats collection code could be altered to attempt to calculate the correlation between columns as part of its other work. I guess that would be rather expensive - to compute correlation you need two passes, and you need to do that for each pair or columns. So I'd be surprised if it is possible (and effective). Another thing is that you can compute correlation only for numeric columns, so it's not possible to do that for city/ZIP code mentioned above. More precisely - it's possible to do that (if you map strings to numbers somehow), but I doubt you'll get useful results as the assignment is rather random. Well, you could ask the governments to assign the ZIP codes to cities in strictly alphabecital order, but I guess they'll say no. 4) Or we could store just those cells where expected and observed values differ significantly (may help if most of the values are indendent, but there's a small glitch somewhere). Comrpessing that grid would be useful, given that for many dimensions most of the grid will be not interesting. In fact, storing the 20 largest values may be enough. Worth an experiment. Not exactly just the largest values - rather values that are significantly different from the expected values. Generally there are two interesting cases expected observed - The optimizer may choose index scan, although the seq scan would be better. expected observed - The optimizer may choose seq scan, although the index scan would be better. regards Tomas -- Sent via pgsql-hackers mailing
Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE
Florian Pflug f...@phlo.org writes: pg_upgrade aborted during the step Adding support functions to new cluster with ERROR: permission denied for language c error. Unfortunately, the log didn't include the name of the database where the error occurred, so it took me a while to figure out that the culprit was a ALTER DATABASE SET ROLE = non-superuser I had done on one of my databases, which effectively prevented pg_upgrade from connection with superuser privileges. That seems like a pretty stupid thing to have done; it would prevent *any* connection to that database with superuser privileges, no? While one could argue that this behaviour is perfectly consistent, I believe most users will expect pg_upgrade (and to a lesser extent also pg_dump and pg_restore) to be unaffected by such settings. This is about like arguing that pg_dump and pg_upgrade should still work after you've done delete from pg_proc;. Superusers are assumed to know what they're doing and not break fundamental operations. I'm thinking that if there's anything we should forbid here, it's the ALTER ... SET itself. In particular, some experimentation suggests that a non-superuser database owner can do it: regression=# create user joe; CREATE ROLE regression=# create database joe with owner joe; CREATE DATABASE regression=# \c joe joe You are now connected to database joe as user joe. joe= alter database joe set role joe; ALTER DATABASE which seems to me at least a bad idea and arguably a security hazard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags
I wrote: Magnus Hagander mag...@hagander.net writes: On Sat, Dec 11, 2010 at 18:46, Tom Lane t...@sss.pgh.pa.us wrote: I think we can just #define the other cases as zeroes. I'm not sure why you think that's an issue for open --- the privileges don't exist. Hmm. I was/am worried about any case that specifies *just* one of the permissions that don't exist. That'll leave it at zero, whereas the correct one might be the user-only version of whatever (read/write) was given. If we didn't specify the user read or write privilege, we shouldn't get it. I put in #define's for these, and it seems to have fixed the MSVC buildfarm members, but cygwin is still broken. How come ... doesn't that port use port/win32.h? What are the values of _S_IREAD and _S_IWRITE, anyway? I'm still wondering how come the previous coding with hardwired constants behaved correctly. Still curious about this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a): The classic failure case has always been: postcodes and city names. Strongly correlated, but in a way that the computer can't easily see. Yeah, and that's actually analogous to the example I used in my presentation. The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ? AND city = ? should be the selectivity of postcode = ? alone. The measurement we need is implicativeness: How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample. Yes, those issues are a righteous punishment for breaking BCNF rules ;-) I'm not sure it's solvable using the contingency tables, as it requires knowledge about dependencies between individual values (working with cells is not enough, although it might improve the estimates). Well, maybe we could collect these stats (number of cities for a given ZIP code and number of ZIP codes for a given city). Collecting a good stats about this is a bit tricky, but possible. What about collecting this for the MCVs from both columns? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] function attributes
On 12/12/2010 10:43 AM, Tom Lane wrote: Tim Bunce seemed to think that this particular problem might be solvable in a completely transparent way, by having byteas convert into Perl objects that have a hook for producing a backwards-compatible text translation. Have you looked into that idea? No. If you're referring to this sentence, which was referring to arrays, not to byteas: It's possible a blessed ref with string overloading would avoid backwards compatibility issues. then it won't work (or at least it would be far more complex than what I've done, and I can't see how it would work) in the case of a bytea, since a bytea becomes a scalar, not a ref, and you can only bless refs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote: The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ? AND city = ? should be the selectivity of postcode = ? alone. The measurement we need is implicativeness: How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample. The statistical term for this is conditional probability, written P(A|B), meaning the probability of A under the assumption or knowledge of B. The basic tool for working with conditional probabilities is bayes' theorem which states that P(A|B) = P(A and B) / P(B). Currently, we assume that P(A|B) = P(A), meaning the probability (or selectivity as we call it) of an event (like a=3) does not change under additional assumptions like b=4. Bayes' theorem thus becomes P(A) = P(A and B) / P(B)= P(A and B) = P(A)*P(B) which is how we currently compute the selectivity of a clause such as WHERE a=3 AND b=4. I believe that measuring this by counting the number of distinct values of column B for each A is basically the right idea. Maybe we could count the number of distinct values of b for every one of the most common values of a, and compare that to the overall number of distinct values of b... A (very) quick search on scholar.google.com for estimate conditional probability didn't turn up anything useful, but it's hard to believe that there isn't at least some literature on the subject. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] function attributes
On 12/12/2010 10:43 AM, Tom Lane wrote: At the moment the behaviour is triggered by a custom setting (plperl.pass_binary_bytea), but this isn't really satisfactory. We could turn it on permanently, but that would break a lot of legacy code. What we really need is a way of marking a function with some attributes. Of course, we could put it in the program text like plpgsql's #variable_conflict, but that's really rather ugly. The grammar already has an attribute mechanism for functions, and ISTM we just need to extend that a bit to allow setting of function attributes reasonably flexibly, much as we can now specify format options on EXPLAIN or we'll soon be able to specify options for foreign tables. I do not want to go there. What you're proposing will soon turn into a real mess, with arbitrary language-specific junk tagged onto pg_proc entries. And what's worse, it'll be mixed with non-language-specific junk, because of the existing legacy WITH entries. Arguably we should deprecate those legacy entries. One, isCachable, is stated in the docs to be be obsolete, and has been for many releases now. The other, isStrict, is a non-preferred way of specifying that the function is strict. But the real issue is that we have no way of specifying properties for a function at creation time other than those provided for in the grammar. We've already made a couple of fairly ugly hacks to do stuff like this in plpgsql. Is that really the road we want to go down? Is it less messy than providing some catalog support for language specific function properties, where they might be visible outside the function source? In the present case, Robert's suggestion of using create function ... set plperl.pass_binary_bytea = true seems to work well enough, although I haven't tried very hard yet to break it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
Dne 12.12.2010 17:33, Florian Pflug napsal(a): On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote: The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ? AND city = ? should be the selectivity of postcode = ? alone. The measurement we need is implicativeness: How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample. The statistical term for this is conditional probability, written P(A|B), meaning the probability of A under the assumption or knowledge of B. The basic tool for working with conditional probabilities is bayes' theorem which states that P(A|B) = P(A and B) / P(B). Currently, we assume that P(A|B) = P(A), meaning the probability (or selectivity as we call it) of an event (like a=3) does not change under additional assumptions like b=4. Bayes' theorem thus becomes P(A) = P(A and B) / P(B)= P(A and B) = P(A)*P(B) which is how we currently compute the selectivity of a clause such as WHERE a=3 AND b=4. I believe that measuring this by counting the number of distinct values of column B for each A is basically the right idea. Maybe we could count the number of distinct values of b for every one of the most common values of a, and compare that to the overall number of distinct values of b... Good point! Well, I was thinking about this too - generally this means creating a contingency table with the MCV as bins. Then you can compute these interesting probabilities P(A and B). (OK, now I definitely look like some contingency table weirdo, who tries to solve everything with a contingency table. OMG!) The question is - what are we going to do when the values in the query are not in the MCV list? Is there some heuristics to estimate the probability from MCV, or something like that? Could we use some average probability or what? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] function attributes
Andrew Dunstan and...@dunslane.net writes: On 12/12/2010 10:43 AM, Tom Lane wrote: At the moment the behaviour is triggered by a custom setting (plperl.pass_binary_bytea), but this isn't really satisfactory. I do not want to go there. But the real issue is that we have no way of specifying properties for a function at creation time other than those provided for in the grammar. We've already made a couple of fairly ugly hacks to do stuff like this in plpgsql. Is that really the road we want to go down? Is it less messy than providing some catalog support for language specific function properties, where they might be visible outside the function source? There might be an argument in the abstract for that, but I can't see expending the work until we have a more pressing concrete requirement than this one. I don't believe that defining this particular behavior as a function property is a good long-term solution, because it seems practically certain that everybody will want to migrate to the new behavior. A GUC works well for that, because you can flip over the default once you reach the point of having converted or marked all your functions. A function property doesn't work at all, unless it's just a means of locally overriding the GUC ... and the SET clause exists for that already. I could be talked into function properties given a few examples of properties that could be expected to remain in use for a long time (like volatile/immutable for instance). But this example is no sale. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE
On Dec12, 2010, at 17:01 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: pg_upgrade aborted during the step Adding support functions to new cluster with ERROR: permission denied for language c error. Unfortunately, the log didn't include the name of the database where the error occurred, so it took me a while to figure out that the culprit was a ALTER DATABASE SET ROLE = non-superuser I had done on one of my databases, which effectively prevented pg_upgrade from connection with superuser privileges. That seems like a pretty stupid thing to have done; it would prevent *any* connection to that database with superuser privileges, no? I had two developers working with that database who regularly modify the schema, often creating new objects (it's a development machine). They both were annoyed that if one of them created a table, he'd be the owner and some operations on that table would be restricted to him and superusers. The ALTER DATABASE SET ROLE fixes that nicely for me. When I needed to work as a superuser with that database, I simply did SET ROLE superuser role to restore my superuser powers. Nowadays, I could probably do the SET ROLE just for some specific combination of user and database. That option, however, wasn't there at the time I did the ALTER DATABASE SET ROLE. While one could argue that this behaviour is perfectly consistent, I believe most users will expect pg_upgrade (and to a lesser extent also pg_dump and pg_restore) to be unaffected by such settings. This is about like arguing that pg_dump and pg_upgrade should still work after you've done delete from pg_proc;. Superusers are assumed to know what they're doing and not break fundamental operations. Sure. If you believe in proof by exaggeration, which I don't. The way I see it, how is a DBA supposed to know that setting a per-database ROLE is a bad idea, but per-database settings for other GUCs are fine. For example, what about synchronous_commit=off vacuum_freeze_min_age datestyle sql_inheritance standard_conforming_strings array_nulls default_with_oids ... Without checking the code, all of these have about the same chance of breaking pg_upgrade. But then, by your line of reasoning, ALTER DATABASE SET ROLE shouldn't haven been invented in the first place. Which maybe even true, but it's too late for that. So the next best thing, IMHO, is to give superusers a way to avoid the hazard it poses. I'm thinking that if there's anything we should forbid here, it's the ALTER ... SET itself. In particular, some experimentation suggests that a non-superuser database owner can do it: regression=# create user joe; CREATE ROLE regression=# create database joe with owner joe; CREATE DATABASE regression=# \c joe joe You are now connected to database joe as user joe. joe= alter database joe set role joe; ALTER DATABASE which seems to me at least a bad idea and arguably a security hazard. I'm sorry, I don't see that security hazard there. Care to explain? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags
On 12/12/2010 11:16 AM, Tom Lane wrote: I put in #define's for these, and it seems to have fixed the MSVC buildfarm members, but cygwin is still broken. How come ... doesn't that port use port/win32.h? ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says: config.status: linking src/include/port/win32.h to src/include/pg_config_os.h cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags
Andrew Dunstan and...@dunslane.net writes: On 12/12/2010 11:16 AM, Tom Lane wrote: I put in #define's for these, and it seems to have fixed the MSVC buildfarm members, but cygwin is still broken. How come ... doesn't that port use port/win32.h? ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says: config.status: linking src/include/port/win32.h to src/include/pg_config_os.h Oh, I guess the point is that WIN32_ONLY_COMPILER doesn't get defined, and that block of file-permission-bit #defines is nested inside #ifdef WIN32_ONLY_COMPILER. So apparently the issue is that the mingw headers provide some but not all of those symbols. Which have they got, and how are they defined? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.
On 12/12/2010 01:43 PM, Tom Lane wrote: Make S_IRGRP etc available in mingw builds as well as MSVC. (Hm, I wonder whether BCC defines them either...) Is anyone building the client stuff with BCC any more? I don't recall having heard of anyone doing so for quite some years. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.
Andrew Dunstan and...@dunslane.net writes: On 12/12/2010 01:43 PM, Tom Lane wrote: (Hm, I wonder whether BCC defines them either...) Is anyone building the client stuff with BCC any more? I don't recall having heard of anyone doing so for quite some years. It's a fair question. We could clean up some of these messy ifdefs if we dropped support for that combination. I assume that an MSVC-built libpq.dll would still work for Borland users, no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.
On Sun, Dec 12, 2010 at 19:54, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 12/12/2010 01:43 PM, Tom Lane wrote: (Hm, I wonder whether BCC defines them either...) Is anyone building the client stuff with BCC any more? I don't recall having heard of anyone doing so for quite some years. It's a fair question. We could clean up some of these messy ifdefs if we dropped support for that combination. I assume that an MSVC-built libpq.dll would still work for Borland users, no? The dynamic one, yes. Static linked one, no. IIRC I suggest desupporting it every now and then and get voted down ;) And quite often we have someone showing up around the x.y.2 release to clean it up so it works again... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.
Magnus Hagander mag...@hagander.net writes: On Sun, Dec 12, 2010 at 19:54, Tom Lane t...@sss.pgh.pa.us wrote: It's a fair question. We could clean up some of these messy ifdefs if we dropped support for that combination. I assume that an MSVC-built libpq.dll would still work for Borland users, no? The dynamic one, yes. Static linked one, no. IIRC I suggest desupporting it every now and then and get voted down ;) And quite often we have someone showing up around the x.y.2 release to clean it up so it works again... Yeah, a look in the commit logs shows that happening about once a year. It'd sure be nice if we had a less haphazard process for it though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
Cédric Villemain cedric.villemain.deb...@gmail.com writes: 2010/12/8 Kineticode Billing da...@kineticode.com: On Dec 8, 2010, at 10:37 AM, Chris Browne wrote: Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS. I kind of like TRANSIENT, but that's only because it's a property I've been working with in some other systems http://www.erlang.org/doc/design_principles/sup_princ.html Restart = permanent | transient | temporary Restart defines when a terminated child process should be restarted. A permanent child process is always restarted. A temporary child process is never restarted. A transient child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal. EVANESCENT. UNSAFE ? What about NOT PERSISTENT ? Then we would have two flavours of them, that's NOT PERSISTENT ON RESTART TRUNCATE or ON RESTART FLUSH, I guess? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wildcard search support for pg_trgm
Alexander Korotkov aekorot...@gmail.com writes: Here is first version of patch, which enable index support of wildcard search in pg_trgm contrib module. How different (and better) is it from wildspeed? http://www.sai.msu.su/~megera/wiki/wildspeed Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - core functionality
On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote: I've tried SQL/MED with postgresql_fdw last night, and found myself confusing the long setup procedure. A simplest way to use it AFAIK is: 1.CREATE FOREIGN DATA WRAPPER ... (or run install sql script) 2.CREATE SERVER ... FOREIGN DATA WRAPPER ... 3.CREATE USER MAPPING FOR ... 4.CREATE FOREIGN TALBE( ... ) From a user's view, this is very long way to see a simplest foreign table. I know it is based on the standard, but I really want a shortcut. Especially, I don't understand why CREATE USER MAPPING FOR current_user SERVER server is needed for default use case. If you forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an error. User mapping is useful if the local user and remote user should be mapped but I imagine in most cases they are the same. postgresql_fdw can tell the remote user by conninfo string, in addition. I reviewed the standard about this, and a lot of things are implementation-defined. I think user mappings could be made optional. This is another topic, but it would be useful if CREATE FOREIGN TABLE can omit column definitions since fdw usually knows what should be there in the definitions. I some times mistyped the column names between remote and local and resulted in fail on execution. Also, according to the standard, the column list in CREATE FOREIGN TABLE is optional (if you can get it in some automatic way, of course). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, patch v17
On Dec 12, 2010, at 12:50 PM, Dimitri Fontaine wrote: The only item with still some work to be done on it is the regression tests support: we're not aiming to full coverage is my understanding, and installing contribs goes a long way towards testing extensions. Do we want more? If so, please detail what exactly. At least those things not exercised by the contrib modules. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wildcard search support for pg_trgm
On Mon, Dec 13, 2010 at 12:14 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: How different (and better) is it from wildspeed? The general advantage is possibility of usage wildcard search and trigram similarity search using the same index. I expect that GIN trigram index is slightly less space demanding, but slightly slower on search than wildspeed. Also, I expect GiST trigram index to be slower on search, but faster on updates. While I didn't check these assumptions in details. I've lack of test datasets for sufficient testing, and I would like to ask community to help me with it. Because testing on dictionaries is good, but obviously not enough. With best regards, Alexander Korotkov.
Re: [HACKERS] create tablespace fails silently, or succeeds improperly
On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote: Yeah. We have gotten complaints in the past from people who tried to specify a mount point as a tablespace, and it failed because of lost+found or the mount dir being root-owned. We've told them to make a subdirectory, but that always seemed like a workaround. With the new layout there's no longer any strong reason to prevent this case from working. Basically, I'm thinking that given CREATE TABLESPACE LOCATION '/foo/bar' the creation and properties of /foo/bar/PG_9.0_201004261 ought to be handled *exactly* the way that the -D target directory of initdb is. We have more than ten years experience behind the assertion that we're dealing with that case in a good way. We should transfer that behavior over to tablespace directories rather than inventing something that works a shade differently. I'm still struggling with the above argument. In one case you are applying a behavior to the argument given to initdb, in the other case you are applying the behavior to a subdirectory of the argument given to CREATE TABLESPACE. I'm not saying the solution is necessarily wrong, but it doesn't seem that this will make things easier or more consistent. An idle thought: How about creating a version-subdirectory also in the PGDATA path. The point about mountpoint annoyance applies here just as well. And it could also make the directory juggling during in-place upgrade more normalized and robust. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On tis, 2010-12-07 at 11:46 +0900, Itagaki Takahiro wrote: On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut pete...@gmx.net wrote: Here is an updated patch to address the issues discussed during this commitfest. I found another issue in the patch; ILIKE in WHERE clause doesn't work. It was surprising because LIKE in WHERE clause and ILIKE in SELECT list works expectedly. - SELECT * FROM pg_class WHERE relname LIKE 'pg%' - SELECT relname ILIKE 'pg%' FROM pg_class; postgres=# SELECT name, setting FROM pg_settings WHERE name IN ('lc_ctype', 'lc_collate', 'server_encoding'); name | setting -+- lc_collate | C lc_ctype| C server_encoding | UTF8 (3 rows) postgres=# SELECT * FROM pg_class WHERE relname ILIKE 'pg%'; ERROR: no collation was derived This is fixed in the 20101213 patch I'm about to send out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation
On mån, 2010-12-06 at 21:26 +0200, Peter Eisentraut wrote: * contrib/citext raises an encoding error when COLLATE is specified even if it is the collation as same as the database default. We might need some special treatment for C locale. =# SHOW lc_collate; == C =# SELECT ('A'::citext) = ('a'::citext); == false =# SELECT ('A'::citext) = ('a'::citext) COLLATE C; ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. OK, I can reproduce that. That's fallout from the lc_ctype_is_c() optimization that I removed, as explained in another email. I'll have to think about that again. This is fixed in the 20101213 patch I'm about to send out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
The new SQL Standard (SQL:2011) contains this: Table constraints are either enforced or not enforced. Domain constraints and assertions are always enforced., 4.17.2 The SQL Standard allows you to turn the checking on and off for CHECK constraints, UNIQUE constraints and FOREIGN KEYS. Which of those make sense for us, if any? The ability to create FKs without checking all the data has been frequently requested to me over many years. OTOH, I can't really see any point in turning on/off all of the other aspects mentioned by the SQL Standard, especially indexes. It's lots of work and seems likely to end with poorer data quality. And the obvious thing is if you don't want a CHECK constraint, just drop it... My proposal is that we add a short and simple clause NOT ENFORCED onto the ADD constraint syntax. So we have ALTER TABLE foo ADD FOREIGN KEY NOT ENFORCED; The enforced state is not persisted - once added the FK is checked every time. So there is no additional column on pg_constraint. The benefit here is that we implement a capability that allows skipping very long running SQL statements when required, and doesn't require too much code. It has been discussed before on hackers, but that was before it was part of the SQL Standard. Oracle has had this for years and it is popular feature. We can expect other RDBMS to implement this feature, now it is part of the standard. If you want more than my good-bits-only proposal, it really isn't going to happen for 9.1, and seems pretty pointless anyway. Very short hack to implement this attached for discussion. No tests, not even a compile - just showing how quick a patch this can be. Thoughts? Alternative syntax? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 784feae..ecadcbd 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -42,7 +42,7 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET ( replaceable class=PARAMETERattribute_option/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable RESET ( replaceable class=PARAMETERattribute_option/replaceable [, ... ] ) ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } -ADD replaceable class=PARAMETERtable_constraint/replaceable +ADD replaceable class=PARAMETERtable_constraint/replaceable [ ENFORCED | NOT ENFORCED ] DROP CONSTRAINT [ IF EXISTS ] replaceable class=PARAMETERconstraint_name/replaceable [ RESTRICT | CASCADE ] DISABLE TRIGGER [ replaceable class=PARAMETERtrigger_name/replaceable | ALL | USER ] ENABLE TRIGGER [ replaceable class=PARAMETERtrigger_name/replaceable | ALL | USER ] @@ -220,11 +220,13 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable /varlistentry varlistentry -termliteralADD replaceable class=PARAMETERtable_constraint/replaceable/literal/term +termliteralADD replaceable class=PARAMETERtable_constraint/replaceable [ ENFORCED | NOT ENFORCED ]/literal/term listitem para This form adds a new constraint to a table using the same syntax as - xref linkend=SQL-CREATETABLE. + xref linkend=SQL-CREATETABLE. Newly added constraints can be defined + as literalNOT ENFORCED/literal, rather than the default setting + literalENFORCED/literal. /para /listitem /varlistentry diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 937992b..3cacad0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -304,13 +304,13 @@ static void ATExecAddIndex(AlteredTableInfo *tab, Relation rel, IndexStmt *stmt, bool is_rebuild, LOCKMODE lockmode); static void ATExecAddConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, - Constraint *newConstraint, bool recurse, LOCKMODE lockmode); + Constraint *newConstraint, bool recurse, LOCKMODE lockmode, bool enforced); static void ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, Constraint *constr, bool recurse, bool recursing, LOCKMODE lockmode); static void ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel, - Constraint *fkconstraint, LOCKMODE lockmode); + Constraint *fkconstraint, LOCKMODE lockmode, bool enforced); static void ATExecDropConstraint(Relation rel, const char *constrName, DropBehavior behavior, bool recurse, bool recursing, @@ -2970,11 +2970,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, break; case AT_AddConstraint: /* ADD CONSTRAINT */ ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd-def, -false,
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Simon Riggs si...@2ndquadrant.com writes: The new SQL Standard (SQL:2011) contains this: Table constraints are either enforced or not enforced. Domain constraints and assertions are always enforced., 4.17.2 The SQL Standard allows you to turn the checking on and off for CHECK constraints, UNIQUE constraints and FOREIGN KEYS. Huh? It allows you to postpone the check until commit. That's far from not enforcing it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The new SQL Standard (SQL:2011) contains this: Table constraints are either enforced or not enforced. Domain constraints and assertions are always enforced., 4.17.2 The SQL Standard allows you to turn the checking on and off for CHECK constraints, UNIQUE constraints and FOREIGN KEYS. Huh? It allows you to postpone the check until commit. That's far from not enforcing it. When a commit statement is executed, all enforced constraints are effectively checked and, if any enforced constraint is not satisfied, then an exception condition is raised and the SQL-transaction is terminated by an implicit rollback statement. This clearly implies that un-enforced constraints are not checked at commit. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] create tablespace fails silently, or succeeds improperly
Peter Eisentraut pete...@gmx.net writes: On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote: Basically, I'm thinking that given CREATE TABLESPACE LOCATION '/foo/bar' the creation and properties of /foo/bar/PG_9.0_201004261 ought to be handled *exactly* the way that the -D target directory of initdb is. We have more than ten years experience behind the assertion that we're dealing with that case in a good way. We should transfer that behavior over to tablespace directories rather than inventing something that works a shade differently. I'm still struggling with the above argument. In one case you are applying a behavior to the argument given to initdb, in the other case you are applying the behavior to a subdirectory of the argument given to CREATE TABLESPACE. I'm not saying the solution is necessarily wrong, but it doesn't seem that this will make things easier or more consistent. Well, it is only an argument by analogy, but the proposal does fix the IMO-clear misbehavior complained of way back at the start of this thread. An idle thought: How about creating a version-subdirectory also in the PGDATA path. The point about mountpoint annoyance applies here just as well. And it could also make the directory juggling during in-place upgrade more normalized and robust. I can't get excited about it. That would break every existing tool that looks into PGDATA, for a fairly marginal simplification during version upgrades. To give just one example of the pain we'd be letting ourselves in for, pg_ctl would now become extremely version-specific. You couldn't even get away with using the wrong copy of pg_ctl during a reinstall after a catversion bump during development. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE
On Sun, Dec 12, 2010 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: This is about like arguing that pg_dump and pg_upgrade should still work after you've done delete from pg_proc;. Superusers are assumed to know what they're doing and not break fundamental operations. No, it isn't like that at all. You've made that argument in the past, and it carries no water with me at all. There's no help for the fact that direct modification of the system catalog contents can fundamentally break things, but DDL commands should not. I'm willing to reserve judgment on whether ALTER DATABASE .. SET ROLE should be disallowed, or whether it should be made to not break things, but blaming the DBA for shooting himself with the loaded foot-gun we thoughtfully provided is unreasonable. And in fact it strikes me that we might not have much choice about how to fix this. I think we are not going to retroactively change the behavior of ALTER DATABASE .. SET ROLE in a released version, but yet we do, I think, want to make pg_upgrade work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote: Huh? It allows you to postpone the check until commit. That's far from not enforcing it. This clearly implies that un-enforced constraints are not checked at commit. [ shrug... ] I can't argue with you about what may or may not be in an unpublished draft of an unratified version of the standard, since I don't have a copy. But allow me to harbor doubts that they really intend to allow someone to force a constraint to be considered valid without any verification. This proposal strikes me as something mysql would do, not the standards committee. (In particular, can a constraint go from not-enforced to enforced state without getting checked at that time?) Even if you're reading the draft correctly, and the wording makes it into a released standard, the implementation you propose would break our code. The incremental FK checks are designed on the assumption that the constraint condition held before; they aren't likely to behave very sanely if the data is bad. I'd want to see a whole lot more analysis of the resulting behavior before even considering an idea like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote: Huh? It allows you to postpone the check until commit. That's far from not enforcing it. This clearly implies that un-enforced constraints are not checked at commit. [ shrug... ] I can't argue with you about what may or may not be in an unpublished draft of an unratified version of the standard, since I don't have a copy. But allow me to harbor doubts that they really intend to allow someone to force a constraint to be considered valid without any verification. This proposal strikes me as something mysql would do, not the standards committee. (In particular, can a constraint go from not-enforced to enforced state without getting checked at that time?) Even if you're reading the draft correctly, and the wording makes it into a released standard, the implementation you propose would break our code. The incremental FK checks are designed on the assumption that the constraint condition held before; they aren't likely to behave very sanely if the data is bad. I'd want to see a whole lot more analysis of the resulting behavior before even considering an idea like this. Wow, you've managed to bash Simon, MySQL, and the SQL standards committee all in one email. I'm not going to argue that careful analysis isn't needed before doing something like this - and, in particular, if we ever get inner-join removal, which I'm still hoping to do at some point, a foreign key that isn't actually guaranteed to be valid might result in queries returning different answers depending on whether or not a join is removed. I guess we'd have to define that as the user's problem for alleging a foreign-key relationship that doesn't truly exist. On the other hand, there's clearly also a use case for this behavior. If a bulk load of prevalidated data forces an expensive revalidation of constraints that are already known to hold, there's a real chance the DBA will be backed into a corner where he simply has no choice but to not use foreign keys, even though he might really want to validate the foreign-key relationships on a going-forward basis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On Sun, Dec 12, 2010 at 9:43 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for postcode = ? AND city = ? should be the selectivity of postcode = ? alone. The measurement we need is implicativeness: How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample. This is a good idea, but I guess the question is what you do next. If you know that the applicability is 100%, you can disregard the restriction clause on the implied column. And if it has no implicatory power, then you just do what we do now. But what if it has some intermediate degree of implicability? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Robert Haas robertmh...@gmail.com writes: ... On the other hand, there's clearly also a use case for this behavior. If a bulk load of prevalidated data forces an expensive revalidation of constraints that are already known to hold, there's a real chance the DBA will be backed into a corner where he simply has no choice but to not use foreign keys, even though he might really want to validate the foreign-key relationships on a going-forward basis. There may well be a case to be made for doing this on grounds of practical usefulness. I'm just voicing extreme skepticism that it can be supported by reference to the standard. Personally I'd prefer to see us look into whether we couldn't arrange for low-impact establishment of a verified FK relationship, analogous to CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim that a uniqueness condition exists, and ISTM that if we can handle that case we probably ought to be able to handle FK checking similarly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, Dec 12, 2010 at 7:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... On the other hand, there's clearly also a use case for this behavior. If a bulk load of prevalidated data forces an expensive revalidation of constraints that are already known to hold, there's a real chance the DBA will be backed into a corner where he simply has no choice but to not use foreign keys, even though he might really want to validate the foreign-key relationships on a going-forward basis. There may well be a case to be made for doing this on grounds of practical usefulness. I'm just voicing extreme skepticism that it can be supported by reference to the standard. Dunno, I haven't read it either. But it does seem like the natural interpretation of NOT ENFORCED. Personally I'd prefer to see us look into whether we couldn't arrange for low-impact establishment of a verified FK relationship, analogous to CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim that a uniqueness condition exists, and ISTM that if we can handle that case we probably ought to be able to handle FK checking similarly. That'd be useful, too, but I don't think it would remove the use case for skipping the check altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE
On Dec13, 2010, at 00:16 , Robert Haas wrote: And in fact it strikes me that we might not have much choice about how to fix this. I think we are not going to retroactively change the behavior of ALTER DATABASE .. SET ROLE in a released version, but yet we do, I think, want to make pg_upgrade work. A simple fix is to teach pg_upgrade to issue RESET SESSION AUTHORIZATION immediately after connecting to a database. I don't see any downside of this currently - it seems that the only case where this wouldn't be a NO-OP is if someone set ROLE to to something else either per-database, per-user or both. Actually, I'd like to provide an option for pg_dump and pg_restore to do that too (not by default, though). If people think this is a good idea, I could come up with a patch. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, Dec 12, 2010 at 4:49 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote: Huh? It allows you to postpone the check until commit. That's far from not enforcing it. This clearly implies that un-enforced constraints are not checked at commit. [ shrug... ] I can't argue with you about what may or may not be in an unpublished draft of an unratified version of the standard, since I don't have a copy. But allow me to harbor doubts that they really intend to allow someone to force a constraint to be considered valid without any verification. This proposal strikes me as something mysql would do, not the standards committee. (In particular, can a constraint go from not-enforced to enforced state without getting checked at that time?) Even if you're reading the draft correctly, and the wording makes it into a released standard, the implementation you propose would break our code. The incremental FK checks are designed on the assumption that the constraint condition held before; they aren't likely to behave very sanely if the data is bad. I'd want to see a whole lot more analysis of the resulting behavior before even considering an idea like this. Wow, you've managed to bash Simon, MySQL, and the SQL standards committee all in one email. I'm not going to argue that careful analysis isn't needed before doing something like this - and, in particular, if we ever get inner-join removal, which I'm still hoping to do at some point, a foreign key that isn't actually guaranteed to be valid might result in queries returning different answers depending on whether or not a join is removed. I guess we'd have to define that as the user's problem for alleging a foreign-key relationship that doesn't truly exist. On the other hand, there's clearly also a use case for this behavior. If a bulk load of prevalidated data forces an expensive revalidation of constraints that are already known to hold, there's a real chance the DBA will be backed into a corner where he simply has no choice but to not use foreign keys, even though he might really want to validate the foreign-key relationships on a going-forward basis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company MySQL does in fact have this feature and it is used by mysqldump. This feature is very useful. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
Dne 13.12.2010 01:05, Robert Haas napsal(a): This is a good idea, but I guess the question is what you do next. If you know that the applicability is 100%, you can disregard the restriction clause on the implied column. And if it has no implicatory power, then you just do what we do now. But what if it has some intermediate degree of implicability? Well, I think you've missed the e-mail from Florian Pflug - he actually pointed out that the 'implicativeness' Heikki mentioned is called conditional probability. And conditional probability can be used to express the AND probability we are looking for (selectiveness). For two columns, this is actually pretty straighforward - as Florian wrote, the equation is P(A and B) = P(A|B) * P(B) = P(B|A) * P(A) where P(B) may be estimated from the current histogram, and P(A|B) may be estimated from the contingency (see the previous mails). And P(A and B) is actually the value we're looking for. Anyway there really is no intermediate degree of aplicability, it just gives you the right estimate. And AFAIR this is easily extensible to more than two columns, as P(A and B and C) = P(A and (B and C)) = P(A|(B and C)) * P(B and C) so it's basically a recursion. Well, I hope my statements are really correct - it's been a few years since I gained my degree in statistics ;-) regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On Sun, Dec 12, 2010 at 8:46 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 13.12.2010 01:05, Robert Haas napsal(a): This is a good idea, but I guess the question is what you do next. If you know that the applicability is 100%, you can disregard the restriction clause on the implied column. And if it has no implicatory power, then you just do what we do now. But what if it has some intermediate degree of implicability? Well, I think you've missed the e-mail from Florian Pflug - he actually pointed out that the 'implicativeness' Heikki mentioned is called conditional probability. And conditional probability can be used to express the AND probability we are looking for (selectiveness). For two columns, this is actually pretty straighforward - as Florian wrote, the equation is P(A and B) = P(A|B) * P(B) = P(B|A) * P(A) Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably compact and would potentially handle the zip code problem - a classic hard case rather neatly. But that wouldn't be sufficient to use the above equation, because there A and B need to be things like column X has value x, and it's not going to be practical to store a complete set of MCVs for column X for each possible value that could appear in column Y. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
P(A|B) = P(A and B) / P(B). Well, until this point we've discussed failure cases involving 'AND' conditions. What about 'OR' conditions? I think the current optimizer computes the selectivity as 's1+s2 - s1*s2' (at least that's what I found in backend/optimizer/path/clausesel.c:630). Sometimes that may return nearly 2x the actual selectivity, but in general it's a reasonable estimate. Are there any severe failure cases that produce much worse estimates? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
Dne 13.12.2010 03:00, Robert Haas napsal(a): Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably compact and would potentially handle the zip code problem - a classic hard case rather neatly. But that wouldn't be sufficient to use the above equation, because there A and B need to be things like column X has value x, and it's not going to be practical to store a complete set of MCVs for column X for each possible value that could appear in column Y. O(ncolumns^2) values? You mean collecting such stats for each possible pair of columns? Well, I meant something different. The proposed solution is based on contingency tables, built for selected groups of columns (not for each possible group). And the contingency table gives you the ability to estimate the probabilities needed to compute the selectivity. Or am I missing something? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On 12/12/2010 08:27 PM, Rob Wultsch wrote: MySQL does in fact have this feature and it is used by mysqldump. This feature is very useful. The trouble is that FK's have more than one use. In particular, they have a documentary use that's used by tools that analyze databases, as well as by tools like htsql. They also have a role as an enforced constraint. In fact it's possible now to disable FK enforcement, by disabling the triggers. It's definitely a footgun though. Just the other day I was asked how data violating the constraint could have got into the table, and caused some surprise by demonstrating how easy this was to produce. So what would actually be an advance in my view would be a mechanism that allowed explicit disabling of a constraint but ensured that it was not violated when re-enabling it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
I wouldn't like to comment on whether or not Simon has correctly interpreted the words of the SQL standards committee, because standards committees sometimes word things in an intentionally ambiguous way to placate different interests, and because it seems fairly inconsequential in this case. IMHO this is a useful feature that should be pursued. There is another precedent that no one mentioned - DB2. From their docs: You can add a foreign key with the NOT ENFORCED option to create an informational referential constraint. This action does not leave the table space in CHECK-pending status, and you do not need to execute CHECK DATA. I understand that DB2's informational referential constraints won't ever be enforced (they just show intent, which is useful to their planner), so this isn't really the same thing. However, DB2 apparently doesn't initially enforce referential integrity when an FK is created on a table with existing data, without any special syntax on the CREATE: DB2 does not validate the data when you add the foreign key. Instead, if the table is populatedthe table space that contains the table is placed in CHECK-pending status, just as if it had been loaded with ENFORCE NO. In this case, you need to execute the CHECK DATA utility to clear the CHECK-pending status. If I am not mistaken, this is almost exactly the behaviour described by Simon, because referential integrity is, presumably, enforced after the FK is created, but before the CHECK DATA utility is optionally run to ensure that we actually have referential integrity at a later time. I believe that Simon's proposal is essentially sound. I don't know why CHECK DATA operates at the tablespace granularity rather than the FK granularity - IANADB2U. If we followed this behaviour, we wouldn't let people just arbitrarily claim that a referential condition exists - rather, we'd let them assert that it /ought/ to exist, and that it will be maintained going forward, and give them the option of verifying that assertion at a later time, after which it actually exists. Unfortunately, this refinement of Simon's proposal would probably entail adding an additional column to pg_constraint. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 13.12.2010 03:00, Robert Haas napsal(a): Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably compact and would potentially handle the zip code problem - a classic hard case rather neatly. But that wouldn't be sufficient to use the above equation, because there A and B need to be things like column X has value x, and it's not going to be practical to store a complete set of MCVs for column X for each possible value that could appear in column Y. O(ncolumns^2) values? You mean collecting such stats for each possible pair of columns? Well, I meant something different. The proposed solution is based on contingency tables, built for selected groups of columns (not for each possible group). And the contingency table gives you the ability to estimate the probabilities needed to compute the selectivity. Or am I missing something? Well, I'm not real familiar with contingency tables, but it seems like you could end up needing to store a huge amount of data to get any benefit out of it, in some cases. For example, in the United States, there are over 40,000 postal codes, and some even larger number of city names, and doesn't the number of entries go as O(m*n)? Now maybe this is useful enough anyway that we should Just Do It, but it'd be a lot cooler if we could find a way to give the planner a meaningful clue out of some more compact representation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/12/8 Kineticode Billing da...@kineticode.com: On Dec 8, 2010, at 10:37 AM, Chris Browne wrote: Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS. EVANESCENT. UNSAFE ? troll MyISAM /troll -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch wult...@gmail.com wrote: On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/12/8 Kineticode Billing da...@kineticode.com: On Dec 8, 2010, at 10:37 AM, Chris Browne wrote: Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS. EVANESCENT. UNSAFE ? troll MyISAM /troll Heh. But that would be corrupt-on-crash, not truncate-on-crash, no? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
On Sat, Dec 11, 2010 at 11:37 PM, Robert Haas robertmh...@gmail.com wrote: Committed with just a few changes to the documentation. Thanks a lot! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufFreelistLock
On Dec 10, 2010, at 10:49 AM, Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Jeff Janes's message of vie dic 10 12:24:34 -0300 2010: As far as I can tell, bgwriter never adds things to the freelist. That is only done at start up, and when a relation or a database is dropped. The clock sweep does the vast majority of the work. AFAIU bgwriter runs the clock sweep most of the time (BgBufferSync). I think bgwriter just tries to write out dirty buffers so they'll be clean when the clock sweep reaches them. It doesn't try to move them to the freelist. Yeah, it calls SyncOneBuffer which does nothing for the clock sweep. There might be some advantage in having it move buffers to a freelist that's just protected by a simple spinlock (or at least, a lock different from the one that protects the clock sweep). The idea would be that most of the time, backends just need to lock the freelist for long enough to take a buffer off it, and don't run clock sweep at all. Yeah, the clock sweep code is very intensive compared to pulling a buffer from the freelist, yet AFAICT nothing will run the clock sweep except backends. Unless I'm missing something, the free list is practically useless because buffers are only put there by InvalidateBuffer, which is only called by DropRelFileNodeBuffers and DropDatabaseBuffers. So we make backends queue up behind the freelist lock with very little odds of getting a buffer, then we make them queue up for the clock sweep lock and make them actually run the clock sweep. BTW, when we moved from 96G to 192G servers I tried increasing shared buffers from 8G to 28G and performance went down enough to be noticeable (we don't have any good benchmarks, so I cant really quantify the degradation). Going back to 8G brought performance back up, so it seems like it was the change in shared buffers that caused the issue (the larger servers also have 24 cores vs 16). My immediate thought was that we needed more lock partitions, but I haven't had the chance to see if that helps. ISTM the issue could just as well be due to clock sweep suddenly taking over 3x longer than before. We're working on getting a performance test environment setup, so hopefully in a month or two we'd be able to actually run some testing on this. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Peter Geoghegan wrote: If we followed this behaviour, we wouldn't let people just arbitrarily claim that a referential condition exists - rather, we'd let them assert that it /ought/ to exist, and that it will be maintained going forward, and give them the option of verifying that assertion at a later time, after which it actually exists. What you outline would be quite valuable to our shop. Under the law, the custodians of the data are the elected clerks of circuit court, and under state law and rules of the state supreme court we can't clean up even the most glaring apparent data problems without the OK of the elected official or his or her designee. We have a very complex schema (although no more complex than necessary to model the reality of the data) with hundreds of foreign key relationships. For various reasons (conversions from old systems, etc.), these relationships don't hold on all tables in all county databases. It would be desirable to have foreign key definitions define the intended relationships anyway, and very useful for them to prevent further data degradation. For those situations where we get a business analyst to work with clerk of court staff to clean up orphaned rows, it would be very slick to be able to run some statement (like CHECK DATA) to see if the cleanup is complete and successful and to flag that the constraint is now enforced. So +1 on what Peter outlined as current DB2 features in this regard. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PS display and standby query conflict
On Sat, Dec 11, 2010 at 11:28 PM, Simon Riggs si...@2ndquadrant.com wrote: This problem happens because ResolveRecoveryConflictWithVirtualXIDs resets PS display for each read-only transactions that recovery waits for. Why do we need to reset that each time even though the conflict has not been resolved yet? The attached patch suppresses such a needless reset. Comments? The reset occurs at most each 500ms, so not much problem there. But if it annoys you, it seems OK to change it. Don't see a reason to backpatch though? I think that It's worth backpatch to prevent users who observe the occurrence of the query conflicts carefully for testing 9.0 from getting confusing. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
The proposed solution is based on contingency tables, built for selected groups of columns (not for each possible group). And the contingency table gives you the ability to estimate the probabilities needed to compute the selectivity. Or am I missing something? Well, I'm not real familiar with contingency tables, but it seems like you could end up needing to store a huge amount of data to get any benefit out of it, in some cases. For example, in the United States, there are over 40,000 postal codes, and some even larger number of city names, and doesn't the number of entries go as O(m*n)? Not to mention that the model parameters will be impossible to estimate well. ( I've only scanned the thread, so sorry if Im repeating something that's already been said ) My intuition is that storing the covariance structure will be unworkable both technically and statistically for all but the simplest of cases. That being said, I dont think the problem is a lack of ways to parameterize the covariance structure ( there are several papers on mutli-dim histogram estimators, at least one of whichtalks about databases explicitly, not to mention approaches like CART[1] ) , but a complete lack of infrastructure to do anything with the estimates. So keep working on it ;-) ( but try to make the framework general enough to allow better estimators ). I wonder if a good first step might be to focus on the AND and OR operators since they seem like the most common cases and union and intersection commute ( although it's important to remember that the estimate variances do NOT ) That is, what about estimating the selectivity of the condition WHERE X=A and Y=B by f(A,B) = x_1*(selectivity(X = A) + selectivity( Y = B )) + x_2*selectivity(X = A)*selectivity( Y = B ) + x_3 where x_{1,2,3} are parameters to be estimated from the data. Another quick note: I think that storing the full contingency table is wasteful since the marginals are already stored in the single column statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was looking at this a couple years back ). Best, Nathan [1] http://en.wikipedia.org/wiki/Classification_and_regression_tree [2] http://en.wikipedia.org/wiki/Copula_%28statistics%29 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Instrument checkpoint sync calls
On Sun, Dec 5, 2010 at 4:23 PM, Greg Smith g...@2ndquadrant.com wrote: Jeff Janes wrote: I've attached a tiny patch to apply over yours, to deal with this and with the case where no files are synced. Thanks for that. That obvious error eluded me because in most of the patch update testing I was doing (on ext3), the longest sync was always about the same length as the total sync time. Attached patch (in correct diff form this time!) collects up all changes. That includes elimination of a potential race condition if someone changes log_checkpoints while a long sync phase is executing. I don't know whether that can happen, and it obviously won't give accurate stats going back to the beginning of the checkpoint in that case, but it tries to defend aginst producing complete garbage if that value changes out from under it. This is the first version of this patch I feel fairly good about; no open concerns left on my side. Jeff, since you're now the de-facto credited reviewer of this one by virtue of suggesting bug fixes, could you take this update out for a spin too? I took a look at this and it looks generally good, but I'm wondering why md.c is converting the results from an exact value to a floating point, only to have xlog.c turn around and convert back to an integer. I think it could just return milliseconds directly, or if you're worried about a checkpoint that takes more than 24 days to complete, seconds and microseconds. Or am I missing something? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] rest of works for security providers in v9.1
I'd like to see opinions what facilities should be developed to the current v9.1 development cycle. We have integrated some of facilities to support a starter- version of SE-PostgreSQL. It allows to hook controls on DML permission checks and assign security labels of client and database obejcts either by-hand or automatically. On the other hand, hooks on DDL permission checks are still future works from now. I believe object_access_hook is applied on various kind of DDL permission checks, but we cannot complete to put these hooks at once, because of patch scale. So, I plan to integrate the following four facilities only in the last commit-fest of v9.1, although it still does not cover comprehensive DDL accesses . * Expand object_access_hook to deliver arguments Some of DDL hooks will need to deliver several arguments in addition to OID of the object being modified. For example, a flag to show whether this deletion is cascaded, or not. So, prototype of the object_access_hook needs to be revised. My idea is to add two arguments: an integer variable for number of arguments and an array variable for the additional information. Then, macros will wrap up invocation of this hook to keep the code simple. * Permission checks on object-prep-creation It was not well concluded in the previous discussion, whether two hooks are needed, or one. I think the idea to divide creation hooks into two phases by its role eventually enables to reduce the burden of code management. Now we have OAT_POST_CREATE hooks just after registration of dependency, basically. It is a simple enough basis, and quite natural places to assign new security labels. However, several cases shall be exceptions of the basis, if we try to check permissions also in the post-creation hooks, in addition to default labeling. For example, heap_create_with_catalog() is called from five places, but only two needs permission checks: DefineRelation() and OpenIntoRel(). A few cases are not obvious whether we need permission checks in this invocation, like a code path from make_new_heap(). It defines a new pg_class entry, but the external module cannot determine from the catalog whether is is invoked on the code path that needs permission checks, or not. So, I want OAT_CREATE hooks being just after existing permission checks for the purpose of access control, not default labeling. * Permission checks on object-deletion The existing code put permission checks of object deletion on command handlers like RemoveRelations(), then it invokes functions in dependency.c to drop the specified and dependent objects (if necessary). I think it is straight-forward to put object-deletion hooks next to the existing permission checks. But it is unavailable to check cascaded objects to be removed here. So, it seems to me findDependentObjects() should be exposed to external modules to inform what objects shall be dropped. Unlike old SE-PostgreSQL implementation, I don't consider it is a good idea to put hook within dependency.c, because we need to inform dependency.c whether this deletion is by-hand, or something internals (such as cleanups of temporary objects). * Permission checks on misc easy implementables Apart from the priority of development, it seems to me that we can hook controls at the following commands quite easy. It is an idea to put hooks that we can implement with little impact around the existing codes. - GRANT/REVOKE - COMMENT ON - SECURITY LABEL Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - core functionality
On Sun, 12 Dec 2010 23:47:53 +0200 Peter Eisentraut pete...@gmx.net wrote: On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote: From a user's view, this is very long way to see a simplest foreign table. I know it is based on the standard, but I really want a shortcut. Especially, I don't understand why CREATE USER MAPPING FOR current_user SERVER server is needed for default use case. If you forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an error. User mapping is useful if the local user and remote user should be mapped but I imagine in most cases they are the same. postgresql_fdw can tell the remote user by conninfo string, in addition. I reviewed the standard about this, and a lot of things are implementation-defined. I think user mappings could be made optional. Simple FDWs such as File FDW might not have concept of user on remote side. In such case, it would be enough to control access privilege per local user with GRANT/REVOKE SELECT statement. This is another topic, but it would be useful if CREATE FOREIGN TABLE can omit column definitions since fdw usually knows what should be there in the definitions. I some times mistyped the column names between remote and local and resulted in fail on execution. Also, according to the standard, the column list in CREATE FOREIGN TABLE is optional (if you can get it in some automatic way, of course). To allow omitting column definitions for that purpose, a way to create ero-column tables would have to be provided. New syntax which allows FDWs to determine column definition would be necessary. ex) -- Create foo from the remote table foo on the server bar CREATE FOREIGN TABLE foo SERVER bar; -- Create zero-column table foo CREATE FOREIGN TABLE foo () SERVER bar; To support this feature, another hook function need to be added to FDW API. ISTM that this feature should be considered with IMPORT SCHEMA statement. Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_is_in_recovery=1
Hi All, When pg_is_in_recovery in the table changes to zero(status change)?? At the time when recovery stops? If switch over has to be done then, after receivibg the signal and telling the postgres to run the startup process (got_SIGHUP = true), shall we have to stop replication or wait for the recovery to get stop? Can I change this pg_is_in_recovery = 0? -- Thanks Regards, Aaliya Zarrin (+91)-9160665888
Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags
On Sun, Dec 12, 2010 at 8:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: What are the values of _S_IREAD and _S_IWRITE, anyway? I'm still wondering how come the previous coding with hardwired constants behaved correctly. Still curious about this. FWIW, _S_IREAD and _S_IWRITE are defined by Visual Studio C++ 2008 in sys/stat.h as 0x0100 and 0x0080 respectively. Glen
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, Dec 12, 2010 at 7:24 PM, Andrew Dunstan and...@dunslane.net wrote: In fact it's possible now to disable FK enforcement, by disabling the triggers. It's definitely a footgun though. Just the other day I was asked how data violating the constraint could have got into the table, and caused some surprise by demonstrating how easy this was to produce. Ugh. I have read the entire pg manual and I did not recall that footgun. At least in MySQL disabling fk's is explicit. There is something to be said for being able to tell the database: Hey, hold my beer and watch this, it might be stupid but it is what we are going to do. The database telling it's user that is a much larger issue (and yes, MySQL is generally worse). The user at least gets to talk to db through sql, the database only really gets to talk to the user through errors and the manual. The fact that fk checks are implemented by the trigger system somehow seems surprising. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlogged tables
On Sun, Dec 12, 2010 at 7:33 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch wult...@gmail.com wrote: On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/12/8 Kineticode Billing da...@kineticode.com: On Dec 8, 2010, at 10:37 AM, Chris Browne wrote: Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS. EVANESCENT. UNSAFE ? troll MyISAM /troll Heh. But that would be corrupt-on-crash, not truncate-on-crash, no? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company troll Yep. Truncate-on-shutdown MySQL options are the MEMORY and PBXT (using the memory resident option). /troll I like TRANSIENT but wonder if MEMORY might be more easily understood by users. -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
On Sat, Dec 11, 2010 at 05:30, Andrew Dunstan and...@dunslane.net wrote: On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: One exports the copy functions from the core, and another implements file_fdw using the infrastructure. Who is actually going to do this split? I'm working for it :-) I extract those functions from copy.c: - CopyState BeginCopyFrom(Relation rel, const char *filename, List *attnamelist, List *options); - void EndCopyFrom(CopyState cstate); - bool NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *oid); There was Reset() in file_fdw, but it is not contained in the patch. It will be added again if required, but I wonder we might need not only reset but also mark/restore a position in a file. -- Itagaki Takahiro copy_export-20101213.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?
Hi, pg_archivecleanup removes unnecessary WAL files from the archive, but not from pg_xlog directory. So, after failover, those WAL files might exist in pg_xlog and be archived again later. Re-archiving of unnecessary WAL files seems odd to me. To avoid this problem, how about changing pg_archivecleanup so that it removes WAL files also in pg_xlog or creates .done file in archive_status when removing them from the archive? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?
On Mon, Dec 13, 2010 at 3:44 PM, Fujii Masao masao.fu...@gmail.com wrote: pg_archivecleanup removes unnecessary WAL files from the archive, but not from pg_xlog directory. So, after failover, those WAL files might exist in pg_xlog and be archived again later. Re-archiving of unnecessary WAL files seems odd to me. To avoid this problem, how about changing pg_archivecleanup so that it removes WAL files also in pg_xlog or creates .done file in archive_status when removing them from the archive? Well, we can avoid this problem by specifying pg_xlog directory instead of the archive in recovery_end_command: recovery_end_command = 'pg_archivecleanup pg_xlog %r' Though this sounds like somewhat bad know-how.. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: auxiliary functions for record type
2010/12/12 Florian Pflug f...@phlo.org: On Dec12, 2010, at 00:19 , Pavel Stehule wrote: I prefer a table based solution, because I don't need a one unnest, but other preferences are valid too. That's fine with me. I dissatisfied with your design of explicit target type via unused value. I think, so we are not a infrastructure for it now - from my view is better to use a common type, that is text now. It's nothing new - plpgsql use it too. Sorry, I can't follow you here. Where does plpgsql use text as common type? I see one well design of explicit target type based on polymorphic types that respect a PostgreSQL fmgr practice: We have to allow a polymorphic functions without polymorphic parameters. These functions shoud be designed to return value in unknown type format when this function has not outer information. I don't think unknown is the right type for that. As far as I known, unknown is still a textual type, used to have some type to assign to string literals during parsing when no better type can be inferred. This information can be passed in function context. When function context isn't null, then function has to read target type and should to return value in target type. Who can fill a function context? It is task for executor. And when CAST contains just function call, then we can recheck, if function is polymorphic, and if it is, then we can set function context to target type, and then we don't need to call a conversion function, because polymorphic function must returns data in correct format. The main difficulty is that currently types are assigned in a bottom-up fashion as far as I know. To make functions with a polymorphic return value, but without polymorphic arguments work, you need to assign the return type in a top-down fashion (It depends on where to value *goes*, not where it *comes from*). That seems like a rather huge change and has the potential to complicate quite a few other parts, most notably function lookup/resolution. Plus, the general case where type information must bubble up more than one level seems pretty much intractable, as it'd require a full-blown type inference algorithm like ML or Haskell. Not a place where we want to go, I believe. The restricted case, on the other hand, brings very little benefit compared to the dummy-parameter approach. Yeah, polymorphic function()::type may look a bit cleaner than polymorphic function(NULL::type), but thats about is. It's only assignments in pl/pgsql which really benefit, since you'd be able to leave out the type completely, writing simply v_value := polymorphic_function(). Does that really warrant the effort that'd be involved? There is a second possibility - and hardly simpler. We can use a specialised statement with own parser/executor node. Then implementation should be really simply syntax: EXTRACT_VALUE(expr1 FROM expr2 AS typename) ... RETURNS typename expr1 ... result must be converted to text .. fieldname expr2 ... result must be composite type disadvantage - EXTRACT_VALUE must be a keyword advantage - simple implementation, available for all environments, readable var := EXTRACT_VALUE('f1' FROM myrec AS int); note: name for this statement isn't important now, can be EXTRACT_FIELD, ... comments, ideas? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?
On 13.12.2010 08:44, Fujii Masao wrote: pg_archivecleanup removes unnecessary WAL files from the archive, but not from pg_xlog directory. So, after failover, those WAL files might exist in pg_xlog and be archived again later. A file that has already been archived successfully should not be archived again. The server keeps track of which files it has already archived with the .ready/.done files. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal : cross-column stats
On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Dne 13.12.2010 03:00, Robert Haas napsal(a): Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably compact and would potentially handle the zip code problem - a classic hard case rather neatly. But that wouldn't be sufficient to use the above equation, because there A and B need to be things like column X has value x, and it's not going to be practical to store a complete set of MCVs for column X for each possible value that could appear in column Y. O(ncolumns^2) values? You mean collecting such stats for each possible pair of columns? Well, I meant something different. The proposed solution is based on contingency tables, built for selected groups of columns (not for each possible group). And the contingency table gives you the ability to estimate the probabilities needed to compute the selectivity. Or am I missing something? Well, I'm not real familiar with contingency tables, but it seems like you could end up needing to store a huge amount of data to get any benefit out of it, in some cases. For example, in the United States, there are over 40,000 postal codes, and some even larger number of city names, and doesn't the number of entries go as O(m*n)? Now maybe this is useful enough anyway that we should Just Do It, but it'd be a lot cooler if we could find a way to give the planner a meaningful clue out of some more compact representation. Yes, storing a complete contingency table is not feasible in such cases. My original proposal actually did not address this particular issue (cities and ZIP codes) as it was based on simplified contingency tables (with bins corresponding to current histograms, not to individual values). So the number of values to store would grow much slower. On the other hand, this generalization really makes it unusable in some cases, and the issue we're discussing here (cities and ZIP codes) is one of them. I think in such cases we could build a contingency table for MCV and then use it to estimate those conditional probabilities we need, but I expect it to be very tricky. Thanks for the comments. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
On Sun, 2010-12-12 at 19:07 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: ... On the other hand, there's clearly also a use case for this behavior. If a bulk load of prevalidated data forces an expensive revalidation of constraints that are already known to hold, there's a real chance the DBA will be backed into a corner where he simply has no choice but to not use foreign keys, even though he might really want to validate the foreign-key relationships on a going-forward basis. There may well be a case to be made for doing this on grounds of practical usefulness. I'm just voicing extreme skepticism that it can be supported by reference to the standard. Personally I'd prefer to see us look into whether we couldn't arrange for low-impact establishment of a verified FK relationship, analogous to CREATE INDEX CONCURRENTLY. We don't let people just arbitrarily claim that a uniqueness condition exists, and ISTM that if we can handle that case we probably ought to be able to handle FK checking similarly. I think we should do *both* things. Sometimes you already know the check will pass, sometimes you don't. In particular, reloading data from another source where you knew the checks passed. Enforcing re-checking in that case reduces data availability. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hstores in pl/python
It would be cool to be able to transparently use hstores as Python dictionaries and vice versa. It would be easy enough with hstore as a core type, but with hstore as an addon it's not that easy. There was talk about including hstore in core, is there still chance for that to happen in 9.1? I'd like to include hstore-dict handling, but with hstore out-of-core the only half-sane way I see is: * hack PL/Python's makefile to add -Icontrib/hstore (yuck!) * create an extension module for Python that knows how to handle hstores that would live next to plpython.so * install it in $libdir on make install * when PL/Python receives or is asked to create an hstore, load the extension module and use it to parse the value (ugly, probably slow) * the module would also have to make sure hstore.so is loaded in the database, which in itself is not pretty, as it would refer to hstore_in/out symbols I wrote a module that can be used with current PL/Python to simplify hstore handling (https://github.com/wulczer/pyhstore), but it suffers from most of the aforementioned problems, and on top of that you get hstore-text-dict instead of just hstore-dict, which sucks. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?
On Mon, Dec 13, 2010 at 4:28 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13.12.2010 08:44, Fujii Masao wrote: pg_archivecleanup removes unnecessary WAL files from the archive, but not from pg_xlog directory. So, after failover, those WAL files might exist in pg_xlog and be archived again later. A file that has already been archived successfully should not be archived again. The server keeps track of which files it has already archived with the .ready/.done files. This seems to require * archiver to save the last archived WAL file name in the shmem * walsender to send it to walreceiver * walreceiver to create .done file when it's arrived * bgwriter not to remove WAL files which don't have .done file in standby Right? One good side effect of this is that we can prevent WAL files from being removed from the standby before the master archives them. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers