Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
 I prefer a table based
 solution, because I don't need a one unnest, but other preferences
 are valid too.
That's fine with me.

 I dissatisfied with your design of explicit target type
 via unused value.  I think, so we are not a infrastructure for it now
 - from my view is better to use a common type, that is text now. It's
 nothing new - plpgsql use it too.
Sorry, I can't follow you here. Where does plpgsql use text as common type?

 I see one well design of explicit target type based on polymorphic
 types that respect a PostgreSQL fmgr practice:
 
 We have to allow a polymorphic functions without polymorphic
 parameters. These functions shoud be designed to return value in
 unknown type format when this function has not outer information.
I don't think unknown is the right type for that. As far as I known, 
unknown is still a textual type, used to have some type to assign to string 
literals during parsing when no better type can be inferred.

 This information can be passed in function context. When function
 context isn't null, then function has to read target type and should
 to return value in target type. Who can fill a function context? It is
 task for executor. And when CAST contains just function call, then we
 can recheck, if function is polymorphic, and if it is, then we can set
 function context to target type, and then we don't need to call a
 conversion function, because polymorphic function must returns data in
 correct format.
The main difficulty is that currently types are assigned in a bottom-up fashion 
as far as I know. To make functions with a polymorphic return value, but 
without polymorphic arguments work, you need to assign the return type in a 
top-down fashion (It depends on where to value *goes*, not where it *comes 
from*). That seems like a rather huge change and has the potential to 
complicate quite a few other parts, most notably function lookup/resolution.

Plus, the general case where type information must bubble up more than one 
level seems pretty much intractable, as it'd require a full-blown type 
inference algorithm like ML or Haskell. Not a place where we want to go, I 
believe.

The restricted case, on the other hand, brings very little benefit compared to 
the dummy-parameter approach. Yeah, polymorphic function()::type may look a 
bit cleaner than polymorphic function(NULL::type), but thats about is. It's 
only assignments in pl/pgsql which really benefit, since you'd be able to leave 
out the type completely, writing simply v_value := polymorphic_function(). 
Does that really warrant the effort that'd be involved?

 Without described functionality we can design a not polymorphic
 function, that can returns unknown type. When similar functionality
 will be implemented, then this function will be changed to
 polymorphic, but from user's perspective, there isn't a change.
I don't really understand why you resist the idea of a dummy parameter so much. 
It might not be pretty, but is it bad enough to rectify putting in all this 
work? Plus, the whole record-manipulation stuff isn't going to win a beauty 
contest anytime soon. But it's better than nothing, so as long as it's 
reasonably efficient I think one can live with a few warts on the API.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
Hi

I've just ran into a problem while upgrading from 8.4 to 9.0.

pg_upgrade aborted during the step Adding support functions to new cluster 
with ERROR:  permission denied for language c error. Unfortunately, the log 
didn't include the name of the database where the error occurred, so it took me 
a while to figure out that the culprit was a ALTER DATABASE SET ROLE = 
non-superuser I had done on one of my databases, which effectively prevented 
pg_upgrade from connection with superuser privileges.

While one could argue that this behaviour is perfectly consistent, I believe 
most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
pg_restore) to be unaffected by such settings.

Should we provide a way (for super-users only, of course) to skip 
per-database/per-role settings when connecting?

best regards
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records

2010-12-12 Thread Simon Riggs
On Sat, 2010-12-11 at 22:03 +0100, Heikki Linnakangas wrote:
 (Moving to pgsql-hackers)
 
 On 10.12.2010 20:21, Tom Lane wrote:
  Simon Riggssi...@2ndquadrant.com  writes:
  Reduce spurious Hot Standby conflicts from never-visible records.
  Hot Standby conflicts only with tuples that were visible at
  some point. So ignore tuples from aborted transactions or for
  tuples updated/deleted during the inserting transaction when
  generating the conflict transaction ids.
 
  Following detailed analysis and test case by Noah Misch.
  Original report covered btree delete records, correctly observed
  by Heikki Linnakangas that this applies to other cases also.
  Fix covers all sources of cleanup records via common code.
  Includes additional fix compared to commit on HEAD
 
  ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
  in that it's examining xmax without having checked that xmax is (a)
  valid or (b) a lock rather than a deletion xmax.
 
 In current use, it's only called for tuples that are known to be dead, 
 so either xmax is a valid deletion, or xmin didn't commit in which case 
 the function doesn't use xmax for anything. So I think it actually works 
 as it is.

Well, I think you're both right.

The function shouldn't be called in places where xmax is the wrong
flavour, but there should be specific safeguards in case of mistake.

 I agree it doesn't look right, though. At the very least it needs 
 comments explaining that, but preferably it should do something sane 
 when faced with a tuple that's not dead after all. Perhaps throw an 
 error (though that would be bad during recovery), or an Assert, or just 
 refrain from advancing latestRemovedXid (or advance it, that would be 
 the conservative stance given the current use).

Yes

 Also, I'm not totally convinced it's correct when xmin  xmax, despite 
 Simon's follow-up commit to fix that. Shouldn't it advance 
 latestRemovedXid to xmin in that case? Or maybe it's ok as it is because 
 we know that xmax committed after xmin. The impression I get from the 
 comment above the function now is that it advances latestRemovedXid to 
 the highest XID present in the tuple, but that's not what it does in the 
 xmin  xmax case. That comment needs clarification.

Hmmm, my earlier code took xmax only if xmax  xmin. That was wrong;
what I have now is better, but your point is there may be an even better
truth. I'll think on that a little more.

 While we're at it, perhaps it would be better to move this function to 
 tqual.c. And I feel that a more natural interface would be something like:
 
 TransactionId
 HeapTupleHeaderGetLatestRemovedXid(HeapTupleHeader tuple);
 
 IOW, instead bumping up the passed-in latestRemovedXid value, return the 
 highest XID on the tuple (if it was dead).
 
 PS. it would be good to set hint bits in that function like in 
 HeapTupleSatisfies* functions.

I'm not that happy with refactoring inside a release, plus I'm not even
sure if that is the right way.

I suspect the best way would be to do this as a side-effect of
HeapSatisfiesVacuum(), since this processing should only ever be done in
conjunction with that function.

Will respond later today on those thoughts.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Alexander Korotkov
I found another problem. GIN index suffers from GIN indexes do not support
whole-index scans when no trigram can be extracted from pattern.


With best regards,
Alexander Korotkov.


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Pavel Stehule
2010/12/12 Florian Pflug f...@phlo.org:
 On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
 I prefer a table based
 solution, because I don't need a one unnest, but other preferences
 are valid too.
 That's fine with me.

 I dissatisfied with your design of explicit target type
 via unused value.  I think, so we are not a infrastructure for it now
 - from my view is better to use a common type, that is text now. It's
 nothing new - plpgsql use it too.
 Sorry, I can't follow you here. Where does plpgsql use text as common type?

plpgsql uses only IO casts. So inside assign statement is checked
target type and real type. But this checking is late! I did a patch
for early conversion to target type (in plan), but this patch was
rejected. So actually, there isn't available information about target
type in expression - and probably will be - from compatibility
reasons. For example: when target variable is int, but you used a
numeric constant, then any assignment does a IO cast from num to int.


 I see one well design of explicit target type based on polymorphic
 types that respect a PostgreSQL fmgr practice:

 We have to allow a polymorphic functions without polymorphic
 parameters. These functions shoud be designed to return value in
 unknown type format when this function has not outer information.
 I don't think unknown is the right type for that. As far as I known, 
 unknown is still a textual type, used to have some type to assign to string 
 literals during parsing when no better type can be inferred.

 This information can be passed in function context. When function
 context isn't null, then function has to read target type and should
 to return value in target type. Who can fill a function context? It is
 task for executor. And when CAST contains just function call, then we
 can recheck, if function is polymorphic, and if it is, then we can set
 function context to target type, and then we don't need to call a
 conversion function, because polymorphic function must returns data in
 correct format.
 The main difficulty is that currently types are assigned in a bottom-up 
 fashion as far as I know. To make functions with a polymorphic return value, 
 but without polymorphic arguments work, you need to assign the return type in 
 a top-down fashion (It depends on where to value *goes*, not where it *comes 
 from*). That seems like a rather huge change and has the potential to 
 complicate quite a few other parts, most notably function lookup/resolution.

I don't think:
a) the place where we don't know a target type is limited only to
first outer cast
b) I didn't defined polymorphic function without polymorphic
parameters (PFWPP) as absolutly undescribed - it returns a unknown
or text in default. There isn't problem to search a this function -
and isn't a problem for later work, so this function returns text,
because first outer cast ensure transformation to correct type.
c) when function is called without outer cast then it runs too - but
there will be one IO cast more.

some alchemy with function descriptor ale used now too - when default
parameters are used.


 Plus, the general case where type information must bubble up more than one 
 level seems pretty much intractable, as it'd require a full-blown type 
 inference algorithm like ML or Haskell. Not a place where we want to go, I 
 believe.

 The restricted case, on the other hand, brings very little benefit compared 
 to the dummy-parameter approach. Yeah, polymorphic function()::type may 
 look a bit cleaner than polymorphic function(NULL::type), but thats about 
 is. It's only assignments in pl/pgsql which really benefit, since you'd be 
 able to leave out the type completely, writing simply v_value := 
 polymorphic_function(). Does that really warrant the effort that'd be 
 involved?

 Without described functionality we can design a not polymorphic
 function, that can returns unknown type. When similar functionality
 will be implemented, then this function will be changed to
 polymorphic, but from user's perspective, there isn't a change.

 I don't really understand why you resist the idea of a dummy parameter so 
 much. It might not be pretty, but is it bad enough to rectify putting in all 
 this work? Plus, the whole record-manipulation stuff isn't going to win a 
 beauty contest anytime soon. But it's better than nothing, so as long as it's 
 reasonably efficient I think one can live with a few warts on the API.

I wrote it. In this case, you don't need to know a value, you have to
work with type. So using a typed null isn't intuitive and it isn't
nice - for me - too ugly for in general module. I know, so PFWPP
functions need a lot of coding without sure result, and it's reason,
why I didn't used it and why I use a text type. And I have a other
reason for - I expect so there is bigger probability to iterate over
different type's fields, so coercion to one target type isn't
available in one path. Using a more path (like you are showed in code)
is 

Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Martijn van Oosterhout
On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
 Hi everyone,
 
 one of the ssesion I've attended on PgDay last week was Heikki's session
 about statistics in PostgreSQL. One of the issues he mentioned (and one
 I regularly run into) is the absence of cross-column stats. When the
 columns are not independent, this usually result in poor estimates (and
 then in suboptimal plans).

Very cool that you're working on this.

 Lets talk about one special case - I'll explain how the proposed
 solution works, and then I'll explain how to make it more general, what
 improvements are possible, what issues are there. Anyway this is by no
 means a perfect or complete solution - it's just a starting point.

It looks like you handled most of the issues. Just a few points:

- This is obviously applicable to more than just integers, probably
  anything with a b-tree operator class. What you've coded seems rely
  on calculations on the values. Have you thought about how it could
  work for, for example, strings?

The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.
Not that I suggest you fix this, but it's food for though. Though
strictly speaking this is a different kind of correlation than what
you're looking at.

 2) I really don't think we should collect stats for all combinations of
columns of a table - I do like the Oracle-like approach where a DBA
has to enable cross-column stats using an ALTER TABLE (for a
particular list of columns).
 
The only exception might be columns from a multi-column index. It
might be quite efficient I guess?

In the past it has been suggested to only do it for multi-column
indexes, but I find these days I find in some situations I prefer to
make individual indexes and let the bitmap scan code combine them. So
perhaps it would be best to let it be configured by the DBA.

 3) There are independence tests for contingency tables (e.g. Pearson's
Chi-squared test), so that it's easy to find out whether the columns
are independent. In that case we can just throw away these stats and
use the simple estimation.
 
http://mathworld.wolfram.com/Chi-SquaredTest.html

I think this would be good to include, if possible. 

Actually, I wonder if the existing stats collection code could be
altered to attempt to calculate the correlation between columns as part
of its other work.

 4) Or we could store just those cells where expected and observed values
differ significantly (may help if most of the values are indendent,
but there's a small glitch somewhere).

Comrpessing that grid would be useful, given that for many dimensions
most of the grid will be not interesting. In fact, storing the 20
largest values may be enough. Worth an experiment.

Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] Extensions, patch v16

2010-12-12 Thread Oleg Bartunov

Hi there,

it's clear we need versions, probably, major.minor would be enough. The problem
I see is how to keep .so in sync with .sql ? Should we store .sql in database ?

Also, we need permissions for extension, since we have open/closed 
extensions.



Oleg

On Sat, 11 Dec 2010, David E. Wheeler wrote:


On Dec 11, 2010, at 1:09 PM, David Fetter wrote:


Why is it in the makefile at all?  If the makefile does need to know it,
why don't we have it scrape the number out of the control file?  Or even
more to the point, since when do we need version numbers in extensions?


We *absolutely* need version numbers in extensions.  People will want
to have a certain version, or a certain minimum version, etc., etc.,
etc., just as they do for any other software.

Seriously, are you OK?


One of the biggest mistakes in the creation of CPAN was allowing modules 
without extensions. It makes figuring out what to upgrade extremely difficult. 
Learning from that, PGXN requires version numbers for all extensions.

Best,

David





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Heikki Linnakangas

On 12.12.2010 15:17, Martijn van Oosterhout wrote:

On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
Very cool that you're working on this.


+1


Lets talk about one special case - I'll explain how the proposed
solution works, and then I'll explain how to make it more general, what
improvements are possible, what issues are there. Anyway this is by no
means a perfect or complete solution - it's just a starting point.


It looks like you handled most of the issues. Just a few points:

- This is obviously applicable to more than just integers, probably
   anything with a b-tree operator class. What you've coded seems rely
   on calculations on the values. Have you thought about how it could
   work for, for example, strings?

The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.


Yeah, and that's actually analogous to the example I used in my 
presentation.


The way I think of that problem is that once you know the postcode, 
knowing the city name doesn't add any information. The postcode implies 
the city name. So the selectivity for postcode = ? AND city = ? should 
be the selectivity of postcode = ? alone. The measurement we need is 
implicativeness: How strongly does column A imply a certain value for 
column B. Perhaps that could be measured by counting the number of 
distinct values of column B for each value of column A, or something 
like that. I don't know what the statisticians call that property, or if 
there's some existing theory on how to measure that from a sample.


That's assuming the combination has any matches. It's possible that the 
user chooses a postcode and city combination that doesn't exist, but 
that's no different from a user doing city = 'fsdfsdfsd' on a single 
column, returning no matches. We should assume that the combination 
makes sense.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Tom Lane
Andrew Dunstan andrew.duns...@pgexperts.com writes:
 Yesterday I did a bit of work on allowing bytea values to be passed into 
 and out of plperl in binary format, effectively removing the need to 
 escape and de-escape them. (The work can be seen on he plperlargs branch 
 of my development repo at 
 https://github.com/adunstan/postgresql-dev/commits/plperlargs/).

 At the moment the behaviour is triggered by a custom setting 
 (plperl.pass_binary_bytea), but this isn't really satisfactory. We could 
 turn it on  permanently, but that would break a lot of legacy code. What 
 we really need is a way of marking a function with some attributes. Of 
 course, we could put it in the program text like plpgsql's 
 #variable_conflict, but that's really rather ugly. The grammar already 
 has an attribute mechanism for functions, and ISTM we just need to 
 extend that a bit to allow setting of function attributes reasonably 
 flexibly, much as we can now specify format options on EXPLAIN or we'll 
 soon be able to specify options for foreign tables.

I do not want to go there.  What you're proposing will soon turn into a
real mess, with arbitrary language-specific junk tagged onto pg_proc
entries.  And what's worse, it'll be mixed with non-language-specific
junk, because of the existing legacy WITH entries.

Tim Bunce seemed to think that this particular problem might be solvable
in a completely transparent way, by having byteas convert into Perl
objects that have a hook for producing a backwards-compatible text
translation.  Have you looked into that idea?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Hi!

Dne 12.12.2010 15:17, Martijn van Oosterhout napsal(a):
 Lets talk about one special case - I'll explain how the proposed
 solution works, and then I'll explain how to make it more general, what
 improvements are possible, what issues are there. Anyway this is by no
 means a perfect or complete solution - it's just a starting point.
 
 It looks like you handled most of the issues. Just a few points:
 
 - This is obviously applicable to more than just integers, probably
   anything with a b-tree operator class. What you've coded seems rely
   on calculations on the values. Have you thought about how it could
   work for, for example, strings?

Yes, I know, I just forgot to address this in my previous e-mail. The
contingency tables have a really nice feature - they are based on
splitting the sets into groups (~ bins of the histograms for each
column). And this can be done if you can sort the values, you really
don't need any calculations. So it should work with strings.

And another thing I somehow forgot is handling the case when there is no
histogram, just MCV. That's mostly the same - each of the values might
be a separate group, or the values might be grouped to form less groups,
etc.

 The classic failure case has always been: postcodes and city names.
 Strongly correlated, but in a way that the computer can't easily see.
 Not that I suggest you fix this, but it's food for though. Though
 strictly speaking this is a different kind of correlation than what
 you're looking at.

Hmmm, I see. I think the proposal does not fix this particular case,
although it might improve the situation a little bit (limit the error
between expected and observed number of rows).

The problem is that once we get to a cell-level of the contingency
table, there is no additional (more detailed) information. So we're
stuck with the multiplication estimate, or something like that.

I was thinking about it actually, and I think we could collect some more
info - a correlation coefficient for each bin, or something like that.
But that was not part of my proposal, and I'm not sure how to do that.

 2) I really don't think we should collect stats for all combinations of
columns of a table - I do like the Oracle-like approach where a DBA
has to enable cross-column stats using an ALTER TABLE (for a
particular list of columns).

The only exception might be columns from a multi-column index. It
might be quite efficient I guess?
 
 In the past it has been suggested to only do it for multi-column
 indexes, but I find these days I find in some situations I prefer to
 make individual indexes and let the bitmap scan code combine them. So
 perhaps it would be best to let it be configured by the DBA.

Yes, I prefer individual indexes too.

The idea behind collecting cross-column stats for multi-column indexes
was that maybe we could 'append' this to the current functionality
(building the index or something like that) so that it does not
introduce significant performance problems.

 3) There are independence tests for contingency tables (e.g. Pearson's
Chi-squared test), so that it's easy to find out whether the columns
are independent. In that case we can just throw away these stats and
use the simple estimation.

http://mathworld.wolfram.com/Chi-SquaredTest.html
 
 I think this would be good to include, if possible. 
 
 Actually, I wonder if the existing stats collection code could be
 altered to attempt to calculate the correlation between columns as part
 of its other work.

I guess that would be rather expensive - to compute correlation you need
two passes, and you need to do that for each pair or columns. So I'd be
surprised if it is possible (and effective).

Another thing is that you can compute correlation only for numeric
columns, so it's not possible to do that for city/ZIP code mentioned
above. More precisely - it's possible to do that (if you map strings to
numbers somehow), but I doubt you'll get useful results as the
assignment is rather random.

Well, you could ask the governments to assign the ZIP codes to cities in
strictly alphabecital order, but I guess they'll say no.

 4) Or we could store just those cells where expected and observed values
differ significantly (may help if most of the values are indendent,
but there's a small glitch somewhere).
 
 Comrpessing that grid would be useful, given that for many dimensions
 most of the grid will be not interesting. In fact, storing the 20
 largest values may be enough. Worth an experiment.

Not exactly just the largest values - rather values that are
significantly different from the expected values. Generally there are
two interesting cases

expected  observed - The optimizer may choose index scan, although
   the seq scan would be better.

expected  observed - The optimizer may choose seq scan, although
   the index scan would be better.

regards
Tomas

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 pg_upgrade aborted during the step Adding support functions to new cluster 
 with ERROR:  permission denied for language c error. Unfortunately, the log 
 didn't include the name of the database where the error occurred, so it took 
 me a while to figure out that the culprit was a ALTER DATABASE SET ROLE = 
 non-superuser I had done on one of my databases, which effectively 
 prevented pg_upgrade from connection with superuser privileges.

That seems like a pretty stupid thing to have done; it would prevent
*any* connection to that database with superuser privileges, no?

 While one could argue that this behaviour is perfectly consistent, I believe 
 most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
 pg_restore) to be unaffected by such settings.

This is about like arguing that pg_dump and pg_upgrade should still work
after you've done delete from pg_proc;.  Superusers are assumed to
know what they're doing and not break fundamental operations.

I'm thinking that if there's anything we should forbid here, it's the
ALTER ... SET itself.  In particular, some experimentation suggests that
a non-superuser database owner can do it:

regression=# create user joe;
CREATE ROLE
regression=# create database joe with owner joe;
CREATE DATABASE
regression=# \c joe joe
You are now connected to database joe as user joe.
joe= alter database joe set role joe;
ALTER DATABASE

which seems to me at least a bad idea and arguably a security hazard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Tom Lane
I wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Sat, Dec 11, 2010 at 18:46, Tom Lane t...@sss.pgh.pa.us wrote:
 I think we can just #define the other cases as zeroes.  I'm not sure why
 you think that's an issue for open --- the privileges don't exist.

 Hmm. I was/am worried about any case that specifies *just* one of the
 permissions that don't exist. That'll leave it at zero, whereas the
 correct one might be the user-only version of whatever (read/write)
 was given.

 If we didn't specify the user read or write privilege, we shouldn't
 get it.

I put in #define's for these, and it seems to have fixed the MSVC
buildfarm members, but cygwin is still broken.  How come ... doesn't
that port use port/win32.h?

 What are the values of _S_IREAD and _S_IWRITE, anyway?  I'm still
 wondering how come the previous coding with hardwired constants
 behaved correctly.

Still curious about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 15:43, Heikki Linnakangas napsal(a):
 The classic failure case has always been: postcodes and city names.
 Strongly correlated, but in a way that the computer can't easily see.
 
 Yeah, and that's actually analogous to the example I used in my
 presentation.
 
 The way I think of that problem is that once you know the postcode,
 knowing the city name doesn't add any information. The postcode implies
 the city name. So the selectivity for postcode = ? AND city = ? should
 be the selectivity of postcode = ? alone. The measurement we need is
 implicativeness: How strongly does column A imply a certain value for
 column B. Perhaps that could be measured by counting the number of
 distinct values of column B for each value of column A, or something
 like that. I don't know what the statisticians call that property, or if
 there's some existing theory on how to measure that from a sample.

Yes, those issues are a righteous punishment for breaking BCNF rules ;-)

I'm not sure it's solvable using the contingency tables, as it requires
knowledge about dependencies between individual values (working with
cells is not enough, although it might improve the estimates).

Well, maybe we could collect these stats (number of cities for a given
ZIP code and number of ZIP codes for a given city). Collecting a good
stats about this is a bit tricky, but possible. What about collecting
this for the MCVs from both columns?

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 10:43 AM, Tom Lane wrote:


Tim Bunce seemed to think that this particular problem might be solvable
in a completely transparent way, by having byteas convert into Perl
objects that have a hook for producing a backwards-compatible text
translation.  Have you looked into that idea?


No. If you're referring to this sentence, which was referring to arrays, 
not to byteas:



It's possible a blessed ref with string overloading would avoid
backwards compatibility issues.



then it won't work (or at least it would be far more complex than what 
I've done, and I can't see how it would work) in the case of a bytea, 
since a bytea becomes a scalar, not a ref, and you can only bless refs.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote:
 The way I think of that problem is that once you know the postcode, knowing 
 the city name doesn't add any information. The postcode implies the city 
 name. So the selectivity for postcode = ? AND city = ? should be the 
 selectivity of postcode = ? alone. The measurement we need is 
 implicativeness: How strongly does column A imply a certain value for 
 column B. Perhaps that could be measured by counting the number of distinct 
 values of column B for each value of column A, or something like that. I 
 don't know what the statisticians call that property, or if there's some 
 existing theory on how to measure that from a sample.

The statistical term for this is conditional probability, written P(A|B), 
meaning the probability of A under the assumption or knowledge of B. The basic 
tool for working with conditional probabilities is bayes' theorem which states 
that

P(A|B) = P(A and B) / P(B).

Currently, we assume that P(A|B) = P(A), meaning the probability (or 
selectivity as we call it) of an event (like a=3) does not change under 
additional assumptions like b=4. Bayes' theorem thus becomes

P(A) = P(A and B) / P(B)=
P(A and B) = P(A)*P(B)

which is how we currently compute the selectivity of a clause such as WHERE 
a=3 AND b=4.

I believe that measuring this by counting the number of distinct values of 
column B for each A is basically the right idea. Maybe we could count the 
number of distinct values of b for every one of the most common values of 
a, and compare that to the overall number of distinct values of b...

A (very) quick search on scholar.google.com for estimate conditional 
probability didn't turn up anything useful, but it's hard to believe that 
there isn't at least some literature on the subject.

best regards,
Florian Pflug
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 10:43 AM, Tom Lane wrote:



At the moment the behaviour is triggered by a custom setting
(plperl.pass_binary_bytea), but this isn't really satisfactory. We could
turn it on  permanently, but that would break a lot of legacy code. What
we really need is a way of marking a function with some attributes. Of
course, we could put it in the program text like plpgsql's
#variable_conflict, but that's really rather ugly. The grammar already
has an attribute mechanism for functions, and ISTM we just need to
extend that a bit to allow setting of function attributes reasonably
flexibly, much as we can now specify format options on EXPLAIN or we'll
soon be able to specify options for foreign tables.

I do not want to go there.  What you're proposing will soon turn into a
real mess, with arbitrary language-specific junk tagged onto pg_proc
entries.  And what's worse, it'll be mixed with non-language-specific
junk, because of the existing legacy WITH entries.



Arguably we should deprecate those legacy entries. One, isCachable,  is 
stated in the docs to be be obsolete, and has been for many releases 
now. The other, isStrict, is a non-preferred way of specifying that the 
function is strict.


But the real issue is that we have no way of specifying properties for a 
function at creation time other than those provided for in the grammar. 
We've already made a couple of fairly ugly hacks to do stuff like this 
in plpgsql. Is that really the road we want to go down? Is it less messy 
than providing some catalog support for language specific function 
properties, where they might be visible outside the function source?


In the present case, Robert's suggestion of using create function ... 
set plperl.pass_binary_bytea = true seems to work well enough, although 
I haven't tried very hard yet to break it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 12.12.2010 17:33, Florian Pflug napsal(a):
 On Dec12, 2010, at 15:43 , Heikki Linnakangas wrote:
 The way I think of that problem is that once you know the postcode, knowing 
 the city name doesn't add any information. The postcode implies the city 
 name. So the selectivity for postcode = ? AND city = ? should be the 
 selectivity of postcode = ? alone. The measurement we need is 
 implicativeness: How strongly does column A imply a certain value for 
 column B. Perhaps that could be measured by counting the number of distinct 
 values of column B for each value of column A, or something like that. I 
 don't know what the statisticians call that property, or if there's some 
 existing theory on how to measure that from a sample.
 
 The statistical term for this is conditional probability, written P(A|B), 
 meaning the probability of A under the assumption or knowledge of B. The 
 basic tool for working with conditional probabilities is bayes' theorem which 
 states that
 
 P(A|B) = P(A and B) / P(B).
 
 Currently, we assume that P(A|B) = P(A), meaning the probability (or 
 selectivity as we call it) of an event (like a=3) does not change under 
 additional assumptions like b=4. Bayes' theorem thus becomes
 
 P(A) = P(A and B) / P(B)=
 P(A and B) = P(A)*P(B)
 
 which is how we currently compute the selectivity of a clause such as WHERE 
 a=3 AND b=4.
 
 I believe that measuring this by counting the number of distinct values of 
 column B for each A is basically the right idea. Maybe we could count the 
 number of distinct values of b for every one of the most common values of 
 a, and compare that to the overall number of distinct values of b...

Good point!

Well, I was thinking about this too - generally this means creating a
contingency table with the MCV as bins. Then you can compute these
interesting probabilities P(A and B). (OK, now I definitely look like
some contingency table weirdo, who tries to solve everything with a
contingency table. OMG!)

The question is - what are we going to do when the values in the query
are not in the MCV list? Is there some heuristics to estimate the
probability from MCV, or something like that? Could we use some
average probability or what?

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] function attributes

2010-12-12 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/12/2010 10:43 AM, Tom Lane wrote:
 At the moment the behaviour is triggered by a custom setting
 (plperl.pass_binary_bytea), but this isn't really satisfactory.

 I do not want to go there.

 But the real issue is that we have no way of specifying properties for a 
 function at creation time other than those provided for in the grammar. 
 We've already made a couple of fairly ugly hacks to do stuff like this 
 in plpgsql. Is that really the road we want to go down? Is it less messy 
 than providing some catalog support for language specific function 
 properties, where they might be visible outside the function source?

There might be an argument in the abstract for that, but I can't see
expending the work until we have a more pressing concrete requirement
than this one.  I don't believe that defining this particular behavior
as a function property is a good long-term solution, because it seems
practically certain that everybody will want to migrate to the new
behavior.  A GUC works well for that, because you can flip over the
default once you reach the point of having converted or marked all your
functions.  A function property doesn't work at all, unless it's just a
means of locally overriding the GUC ... and the SET clause exists for
that already.

I could be talked into function properties given a few examples of
properties that could be expected to remain in use for a long time
(like volatile/immutable for instance).  But this example is no sale.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
On Dec12, 2010, at 17:01 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 pg_upgrade aborted during the step Adding support functions to new cluster 
 with ERROR:  permission denied for language c error. Unfortunately, the 
 log didn't include the name of the database where the error occurred, so it 
 took me a while to figure out that the culprit was a ALTER DATABASE SET 
 ROLE = non-superuser I had done on one of my databases, which effectively 
 prevented pg_upgrade from connection with superuser privileges.
 
 That seems like a pretty stupid thing to have done; it would prevent
 *any* connection to that database with superuser privileges, no?

I had two developers working with that database who regularly modify the 
schema, often creating new objects (it's a development machine). They both were 
annoyed that if one of them created a table, he'd be the owner and some 
operations on that table would be restricted to him and superusers. The ALTER 
DATABASE SET ROLE fixes that nicely for me. When I needed to work as a 
superuser with that database, I simply did SET ROLE superuser role to 
restore my superuser powers.

Nowadays, I could probably do the SET ROLE just for some specific combination 
of user and database. That option, however, wasn't there at the time I did the 
ALTER DATABASE SET ROLE.

 While one could argue that this behaviour is perfectly consistent, I believe 
 most users will expect pg_upgrade (and to a lesser extent also pg_dump and 
 pg_restore) to be unaffected by such settings.
 
 This is about like arguing that pg_dump and pg_upgrade should still work
 after you've done delete from pg_proc;.  Superusers are assumed to
 know what they're doing and not break fundamental operations.

Sure. If you believe in proof by exaggeration, which I don't.

The way I see it, how is a DBA supposed to know that setting a per-database 
ROLE is a bad idea, but per-database settings for other GUCs are fine.
For example, what about
  synchronous_commit=off
  vacuum_freeze_min_age
  datestyle
  sql_inheritance
  standard_conforming_strings
  array_nulls
  default_with_oids
  ...

Without checking the code, all of these have about the same chance of breaking 
pg_upgrade. But then, by your line of reasoning, ALTER DATABASE SET ROLE 
shouldn't haven been invented in the first place. Which maybe even true, but 
it's too late for that. So the next best thing, IMHO, is to give superusers a 
way to avoid the hazard it poses.
 
 I'm thinking that if there's anything we should forbid here, it's the
 ALTER ... SET itself.  In particular, some experimentation suggests that
 a non-superuser database owner can do it:
 
 regression=# create user joe;
 CREATE ROLE
 regression=# create database joe with owner joe;
 CREATE DATABASE
 regression=# \c joe joe
 You are now connected to database joe as user joe.
 joe= alter database joe set role joe;
 ALTER DATABASE
 
 which seems to me at least a bad idea and arguably a security hazard.
I'm sorry, I don't see that security hazard there. Care to explain?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 11:16 AM, Tom Lane wrote:


I put in #define's for these, and it seems to have fixed the MSVC
buildfarm members, but cygwin is still broken.  How come ... doesn't
that port use port/win32.h?



ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says:

   config.status: linking src/include/port/win32.h to src/include/pg_config_os.h


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/12/2010 11:16 AM, Tom Lane wrote:
 I put in #define's for these, and it seems to have fixed the MSVC
 buildfarm members, but cygwin is still broken.  How come ... doesn't
 that port use port/win32.h?

 ITYM Mingw. And yes, it does use port/win32.h; narwhal's log says:
 config.status: linking src/include/port/win32.h to 
 src/include/pg_config_os.h

Oh, I guess the point is that WIN32_ONLY_COMPILER doesn't get defined,
and that block of file-permission-bit #defines is nested inside
#ifdef WIN32_ONLY_COMPILER.

So apparently the issue is that the mingw headers provide some but not
all of those symbols.  Which have they got, and how are they defined?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 01:43 PM, Tom Lane wrote:

Make S_IRGRP etc available in mingw builds as well as MSVC.

(Hm, I wonder whether BCC defines them either...)




Is anyone building the client stuff with BCC any more? I don't recall 
having heard of anyone doing so for quite some years.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/12/2010 01:43 PM, Tom Lane wrote:
 (Hm, I wonder whether BCC defines them either...)

 Is anyone building the client stuff with BCC any more? I don't recall 
 having heard of anyone doing so for quite some years.

It's a fair question.  We could clean up some of these messy ifdefs
if we dropped support for that combination.  I assume that an MSVC-built
libpq.dll would still work for Borland users, no?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Magnus Hagander
On Sun, Dec 12, 2010 at 19:54, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 12/12/2010 01:43 PM, Tom Lane wrote:
 (Hm, I wonder whether BCC defines them either...)

 Is anyone building the client stuff with BCC any more? I don't recall
 having heard of anyone doing so for quite some years.

 It's a fair question.  We could clean up some of these messy ifdefs
 if we dropped support for that combination.  I assume that an MSVC-built
 libpq.dll would still work for Borland users, no?

The dynamic one, yes. Static linked one, no.

IIRC I suggest desupporting it every now and then and get voted down
;) And quite often we have someone showing up around the x.y.2 release
to clean it up so it works again...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make S_IRGRP etc available in mingw builds as well as MSVC.

2010-12-12 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sun, Dec 12, 2010 at 19:54, Tom Lane t...@sss.pgh.pa.us wrote:
 It's a fair question.  We could clean up some of these messy ifdefs
 if we dropped support for that combination.  I assume that an MSVC-built
 libpq.dll would still work for Borland users, no?

 The dynamic one, yes. Static linked one, no.

 IIRC I suggest desupporting it every now and then and get voted down
 ;) And quite often we have someone showing up around the x.y.2 release
 to clean it up so it works again...

Yeah, a look in the commit logs shows that happening about once a year.
It'd sure be nice if we had a less haphazard process for it though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Dimitri Fontaine
Cédric Villemain cedric.villemain.deb...@gmail.com writes:
 2010/12/8 Kineticode Billing da...@kineticode.com:
 On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

 Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

I kind of like TRANSIENT, but that's only because it's a property I've
been working with in some other systems 

  http://www.erlang.org/doc/design_principles/sup_princ.html

Restart = permanent | transient | temporary

Restart defines when a terminated child process should be restarted.

A permanent child process is always restarted.
A temporary child process is never restarted.
A transient child process is restarted only if it terminates abnormally, 
i.e. with another exit reason than normal.
 
 EVANESCENT.

 UNSAFE ?

What about NOT PERSISTENT ? Then we would have two flavours of them,
that's NOT PERSISTENT ON RESTART TRUNCATE or ON RESTART FLUSH, I guess?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Dimitri Fontaine
Alexander Korotkov aekorot...@gmail.com writes:
 Here is first version of patch, which enable index support of wildcard
 search in pg_trgm contrib module.

How different (and better) is it from wildspeed?

  http://www.sai.msu.su/~megera/wiki/wildspeed

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-12 Thread Peter Eisentraut
On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote:
 I've tried SQL/MED with postgresql_fdw last night, and found myself
 confusing the long setup procedure. A simplest way to use it AFAIK is:
 
 1.CREATE FOREIGN DATA WRAPPER ... (or run install sql script)
 2.CREATE SERVER ... FOREIGN DATA WRAPPER ...
 3.CREATE USER MAPPING FOR ...
 4.CREATE FOREIGN TALBE( ... )
 
 From a user's view, this is very long way to see a simplest foreign
 table. I know it is based on the standard, but I really want a
 shortcut. Especially, I don't understand why CREATE USER MAPPING FOR
 current_user SERVER server is needed for default use case. If you
 forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an
 error. User mapping is useful if the local user and remote user should
 be mapped but I imagine in most cases they are the same.
 postgresql_fdw can tell the remote user by conninfo string, in
 addition.

I reviewed the standard about this, and a lot of things are
implementation-defined.  I think user mappings could be made optional.

 This is another topic, but it would be useful if CREATE FOREIGN TABLE
 can omit column definitions since fdw usually knows what should be
 there in the definitions. I some times mistyped the column names
 between remote and local and resulted in fail on execution.

Also, according to the standard, the column list in CREATE FOREIGN TABLE
is optional (if you can get it in some automatic way, of course).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch v17

2010-12-12 Thread David E. Wheeler
On Dec 12, 2010, at 12:50 PM, Dimitri Fontaine wrote:

 The only item with still some work to be done on it is the regression
 tests support: we're not aiming to full coverage is my understanding,
 and installing contribs goes a long way towards testing extensions. Do
 we want more? If so, please detail what exactly.

At least those things not exercised by the contrib modules.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wildcard search support for pg_trgm

2010-12-12 Thread Alexander Korotkov
On Mon, Dec 13, 2010 at 12:14 AM, Dimitri Fontaine
dimi...@2ndquadrant.frwrote:

 How different (and better) is it from wildspeed?


The general advantage is possibility of usage wildcard search and trigram
similarity search using the same index.
I expect that GIN trigram index is slightly less space demanding, but
slightly slower on search than wildspeed. Also, I expect GiST trigram index
to be slower on search, but faster on updates. While I didn't check these
assumptions in details.
I've lack of test datasets for sufficient testing, and I would like to ask
community to help me with it. Because testing on dictionaries is good,
but obviously
not enough.


With best regards,
Alexander Korotkov.


Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-12-12 Thread Peter Eisentraut
On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote:
 Yeah.  We have gotten complaints in the past from people who tried to
 specify a mount point as a tablespace, and it failed because of
 lost+found or the mount dir being root-owned.  We've told them to make
 a subdirectory, but that always seemed like a workaround.  With the
 new layout there's no longer any strong reason to prevent this case
 from working.
 
 Basically, I'm thinking that given CREATE TABLESPACE LOCATION
 '/foo/bar'
 the creation and properties of /foo/bar/PG_9.0_201004261 ought to be
 handled *exactly* the way that the -D target directory of initdb is.
 We have more than ten years experience behind the assertion that we're
 dealing with that case in a good way.  We should transfer that
 behavior over to tablespace directories rather than inventing
 something that works a shade differently.

I'm still struggling with the above argument.  In one case you are
applying a behavior to the argument given to initdb, in the other case
you are applying the behavior to a subdirectory of the argument given to
CREATE TABLESPACE.  I'm not saying the solution is necessarily wrong,
but it doesn't seem that this will make things easier or more
consistent.

An idle thought: How about creating a version-subdirectory also in the
PGDATA path.  The point about mountpoint annoyance applies here just as
well.  And it could also make the directory juggling during in-place
upgrade more normalized and robust.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-12-12 Thread Peter Eisentraut
On tis, 2010-12-07 at 11:46 +0900, Itagaki Takahiro wrote:
 On Sun, Dec 5, 2010 at 01:04, Peter Eisentraut pete...@gmx.net wrote:
  Here is an updated patch to address the issues discussed during this
  commitfest.
 
 I found another issue in the patch; ILIKE in WHERE clause doesn't work.
 It was surprising because LIKE in WHERE clause and ILIKE in SELECT list
 works expectedly.
  - SELECT * FROM pg_class WHERE relname LIKE 'pg%'
  - SELECT relname ILIKE 'pg%' FROM pg_class;
 
 
 postgres=# SELECT name, setting FROM pg_settings
  WHERE name IN ('lc_ctype', 'lc_collate', 'server_encoding');
   name   | setting
 -+-
  lc_collate  | C
  lc_ctype| C
  server_encoding | UTF8
 (3 rows)
 
 postgres=# SELECT * FROM pg_class WHERE relname ILIKE 'pg%';
 ERROR:  no collation was derived

This is fixed in the 20101213 patch I'm about to send out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-12-12 Thread Peter Eisentraut
On mån, 2010-12-06 at 21:26 +0200, Peter Eisentraut wrote:
 
  * contrib/citext raises an encoding error when COLLATE is specified
  even if it is the collation as same as the database default.
  We might need some special treatment for C locale.
  =# SHOW lc_collate;  == C
  =# SELECT ('A'::citext) = ('a'::citext);  == false
  =# SELECT ('A'::citext) = ('a'::citext) COLLATE C;
  ERROR:  invalid multibyte character for locale
  HINT:  The server's LC_CTYPE locale is probably incompatible with
 the
  database encoding.
 
 OK, I can reproduce that.  That's fallout from the lc_ctype_is_c()
 optimization that I removed, as explained in another email.  I'll have
 to think about that again.

This is fixed in the 20101213 patch I'm about to send out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs

The new SQL Standard (SQL:2011) contains this:
Table constraints are either enforced or not enforced. Domain
constraints and assertions are always enforced., 4.17.2

The SQL Standard allows you to turn the checking on and off for CHECK
constraints, UNIQUE constraints and FOREIGN KEYS.

Which of those make sense for us, if any? The ability to create FKs
without checking all the data has been frequently requested to me over
many years. OTOH, I can't really see any point in turning on/off all of
the other aspects mentioned by the SQL Standard, especially indexes.
It's lots of work and seems likely to end with poorer data quality. And
the obvious thing is if you don't want a CHECK constraint, just drop
it...

My proposal is that we add a short and simple clause NOT ENFORCED onto
the ADD constraint syntax. So we have

ALTER TABLE foo
ADD FOREIGN KEY  NOT ENFORCED;

The enforced state is not persisted - once added the FK is checked
every time. So there is no additional column on pg_constraint.

The benefit here is that we implement a capability that allows skipping
very long running SQL statements when required, and doesn't require too
much code. It has been discussed before on hackers, but that was before
it was part of the SQL Standard. Oracle has had this for years and it is
popular feature. We can expect other RDBMS to implement this feature,
now it is part of the standard.

If you want more than my good-bits-only proposal, it really isn't going
to happen for 9.1, and seems pretty pointless anyway.

Very short hack to implement this attached for discussion. No tests, not
even a compile - just showing how quick a patch this can be.

Thoughts? Alternative syntax?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 784feae..ecadcbd 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -42,7 +42,7 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET ( replaceable class=PARAMETERattribute_option/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable RESET ( replaceable class=PARAMETERattribute_option/replaceable [, ... ] )
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn/replaceable SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
-ADD replaceable class=PARAMETERtable_constraint/replaceable
+ADD replaceable class=PARAMETERtable_constraint/replaceable [ ENFORCED | NOT ENFORCED ]
 DROP CONSTRAINT [ IF EXISTS ]  replaceable class=PARAMETERconstraint_name/replaceable [ RESTRICT | CASCADE ]
 DISABLE TRIGGER [ replaceable class=PARAMETERtrigger_name/replaceable | ALL | USER ]
 ENABLE TRIGGER [ replaceable class=PARAMETERtrigger_name/replaceable | ALL | USER ]
@@ -220,11 +220,13 @@ ALTER TABLE replaceable class=PARAMETERname/replaceable
/varlistentry
 
varlistentry
-termliteralADD replaceable class=PARAMETERtable_constraint/replaceable/literal/term
+termliteralADD replaceable class=PARAMETERtable_constraint/replaceable [ ENFORCED | NOT ENFORCED ]/literal/term
 listitem
  para
   This form adds a new constraint to a table using the same syntax as
-  xref linkend=SQL-CREATETABLE.
+  xref linkend=SQL-CREATETABLE. Newly added constraints can be defined
+  as literalNOT ENFORCED/literal, rather than the default setting
+  literalENFORCED/literal.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 937992b..3cacad0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -304,13 +304,13 @@ static void ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 			   IndexStmt *stmt, bool is_rebuild, LOCKMODE lockmode);
 static void ATExecAddConstraint(List **wqueue,
 	AlteredTableInfo *tab, Relation rel,
-	Constraint *newConstraint, bool recurse, LOCKMODE lockmode);
+	Constraint *newConstraint, bool recurse, LOCKMODE lockmode, bool enforced);
 static void ATAddCheckConstraint(List **wqueue,
 	 AlteredTableInfo *tab, Relation rel,
 	 Constraint *constr,
 	 bool recurse, bool recursing, LOCKMODE lockmode);
 static void ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
-		  Constraint *fkconstraint, LOCKMODE lockmode);
+		  Constraint *fkconstraint, LOCKMODE lockmode, bool enforced);
 static void ATExecDropConstraint(Relation rel, const char *constrName,
 	 DropBehavior behavior,
 	 bool recurse, bool recursing,
@@ -2970,11 +2970,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd-def,
-false, 

Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 The new SQL Standard (SQL:2011) contains this:
 Table constraints are either enforced or not enforced. Domain
 constraints and assertions are always enforced., 4.17.2

 The SQL Standard allows you to turn the checking on and off for CHECK
 constraints, UNIQUE constraints and FOREIGN KEYS.

Huh?  It allows you to postpone the check until commit.  That's far from
not enforcing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs
On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  The new SQL Standard (SQL:2011) contains this:
  Table constraints are either enforced or not enforced. Domain
  constraints and assertions are always enforced., 4.17.2
 
  The SQL Standard allows you to turn the checking on and off for CHECK
  constraints, UNIQUE constraints and FOREIGN KEYS.
 
 Huh?  It allows you to postpone the check until commit.  That's far from
 not enforcing it.

When a commit statement is executed, all enforced constraints are
effectively checked and, if any enforced
constraint is not satisfied, then an exception condition is raised and
the SQL-transaction is terminated by an
implicit rollback statement.

This clearly implies that un-enforced constraints are not checked at
commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-12-12 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2010-10-18 at 15:50 -0400, Tom Lane wrote:
 Basically, I'm thinking that given CREATE TABLESPACE LOCATION '/foo/bar'
 the creation and properties of /foo/bar/PG_9.0_201004261 ought to be
 handled *exactly* the way that the -D target directory of initdb is.
 We have more than ten years experience behind the assertion that we're
 dealing with that case in a good way.  We should transfer that
 behavior over to tablespace directories rather than inventing
 something that works a shade differently.

 I'm still struggling with the above argument.  In one case you are
 applying a behavior to the argument given to initdb, in the other case
 you are applying the behavior to a subdirectory of the argument given to
 CREATE TABLESPACE.  I'm not saying the solution is necessarily wrong,
 but it doesn't seem that this will make things easier or more
 consistent.

Well, it is only an argument by analogy, but the proposal does fix the
IMO-clear misbehavior complained of way back at the start of this
thread.

 An idle thought: How about creating a version-subdirectory also in the
 PGDATA path.  The point about mountpoint annoyance applies here just as
 well.  And it could also make the directory juggling during in-place
 upgrade more normalized and robust.

I can't get excited about it.  That would break every existing tool that
looks into PGDATA, for a fairly marginal simplification during version
upgrades.  To give just one example of the pain we'd be letting
ourselves in for, pg_ctl would now become extremely version-specific.
You couldn't even get away with using the wrong copy of pg_ctl during a
reinstall after a catversion bump during development.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 11:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 This is about like arguing that pg_dump and pg_upgrade should still work
 after you've done delete from pg_proc;.  Superusers are assumed to
 know what they're doing and not break fundamental operations.

No, it isn't like that at all.  You've made that argument in the past,
and it carries no water with me at all.  There's no help for the fact
that direct modification of the system catalog contents can
fundamentally break things, but DDL commands should not.  I'm willing
to reserve judgment on whether ALTER DATABASE .. SET ROLE should be
disallowed, or whether it should be made to not break things, but
blaming the DBA for shooting himself with the loaded foot-gun we
thoughtfully provided is unreasonable.

And in fact it strikes me that we might not have much choice about how
to fix this.  I think we are not going to retroactively change the
behavior of ALTER DATABASE .. SET ROLE in a released version, but yet
we do, I think, want to make pg_upgrade work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
 Huh?  It allows you to postpone the check until commit.  That's far from
 not enforcing it.

 This clearly implies that un-enforced constraints are not checked at
 commit.

[ shrug... ]  I can't argue with you about what may or may not be in an
unpublished draft of an unratified version of the standard, since I
don't have a copy.  But allow me to harbor doubts that they really
intend to allow someone to force a constraint to be considered valid
without any verification.  This proposal strikes me as something mysql
would do, not the standards committee.  (In particular, can a constraint
go from not-enforced to enforced state without getting checked at that
time?)

Even if you're reading the draft correctly, and the wording makes it
into a released standard, the implementation you propose would break our
code.  The incremental FK checks are designed on the assumption that the
constraint condition held before; they aren't likely to behave very
sanely if the data is bad.  I'd want to see a whole lot more analysis of
the resulting behavior before even considering an idea like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
 Huh?  It allows you to postpone the check until commit.  That's far from
 not enforcing it.

 This clearly implies that un-enforced constraints are not checked at
 commit.

 [ shrug... ]  I can't argue with you about what may or may not be in an
 unpublished draft of an unratified version of the standard, since I
 don't have a copy.  But allow me to harbor doubts that they really
 intend to allow someone to force a constraint to be considered valid
 without any verification.  This proposal strikes me as something mysql
 would do, not the standards committee.  (In particular, can a constraint
 go from not-enforced to enforced state without getting checked at that
 time?)

 Even if you're reading the draft correctly, and the wording makes it
 into a released standard, the implementation you propose would break our
 code.  The incremental FK checks are designed on the assumption that the
 constraint condition held before; they aren't likely to behave very
 sanely if the data is bad.  I'd want to see a whole lot more analysis of
 the resulting behavior before even considering an idea like this.

Wow, you've managed to bash Simon, MySQL, and the SQL standards
committee all in one email.

I'm not going to argue that careful analysis isn't needed before doing
something like this - and, in particular, if we ever get inner-join
removal, which I'm still hoping to do at some point, a foreign key
that isn't actually guaranteed to be valid might result in queries
returning different answers depending on whether or not a join is
removed.  I guess we'd have to define that as the user's problem for
alleging a foreign-key relationship that doesn't truly exist.  On the
other hand, there's clearly also a use case for this behavior.  If a
bulk load of prevalidated data forces an expensive revalidation of
constraints that are already known to hold, there's a real chance the
DBA will be backed into a corner where he simply has no choice but to
not use foreign keys, even though he might really want to validate the
foreign-key relationships on a going-forward basis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:43 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The way I think of that problem is that once you know the postcode, knowing
 the city name doesn't add any information. The postcode implies the city
 name. So the selectivity for postcode = ? AND city = ? should be the
 selectivity of postcode = ? alone. The measurement we need is
 implicativeness: How strongly does column A imply a certain value for
 column B. Perhaps that could be measured by counting the number of distinct
 values of column B for each value of column A, or something like that. I
 don't know what the statisticians call that property, or if there's some
 existing theory on how to measure that from a sample.

This is a good idea, but I guess the question is what you do next.  If
you know that the applicability is 100%, you can disregard the
restriction clause on the implied column.  And if it has no
implicatory power, then you just do what we do now.  But what if it
has some intermediate degree of implicability?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ...  On the
 other hand, there's clearly also a use case for this behavior.  If a
 bulk load of prevalidated data forces an expensive revalidation of
 constraints that are already known to hold, there's a real chance the
 DBA will be backed into a corner where he simply has no choice but to
 not use foreign keys, even though he might really want to validate the
 foreign-key relationships on a going-forward basis.

There may well be a case to be made for doing this on grounds of
practical usefulness.  I'm just voicing extreme skepticism that it can
be supported by reference to the standard.

Personally I'd prefer to see us look into whether we couldn't arrange
for low-impact establishment of a verified FK relationship, analogous to
CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
that a uniqueness condition exists, and ISTM that if we can handle that
case we probably ought to be able to handle FK checking similarly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 7:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 ...  On the
 other hand, there's clearly also a use case for this behavior.  If a
 bulk load of prevalidated data forces an expensive revalidation of
 constraints that are already known to hold, there's a real chance the
 DBA will be backed into a corner where he simply has no choice but to
 not use foreign keys, even though he might really want to validate the
 foreign-key relationships on a going-forward basis.

 There may well be a case to be made for doing this on grounds of
 practical usefulness.  I'm just voicing extreme skepticism that it can
 be supported by reference to the standard.

Dunno, I haven't read it either.  But it does seem like the natural
interpretation of NOT ENFORCED.

 Personally I'd prefer to see us look into whether we couldn't arrange
 for low-impact establishment of a verified FK relationship, analogous to
 CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
 that a uniqueness condition exists, and ISTM that if we can handle that
 case we probably ought to be able to handle FK checking similarly.

That'd be useful, too, but I don't think it would remove the use case
for skipping the check altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with pg_upgrade (8.4 - 9.0) due to ALTER DATABASE SET ROLE

2010-12-12 Thread Florian Pflug
On Dec13, 2010, at 00:16 , Robert Haas wrote:
 And in fact it strikes me that we might not have much choice about how
 to fix this.  I think we are not going to retroactively change the
 behavior of ALTER DATABASE .. SET ROLE in a released version, but yet
 we do, I think, want to make pg_upgrade work.


A simple fix is to teach pg_upgrade to issue RESET SESSION AUTHORIZATION 
immediately after connecting to a database. I don't see any downside of this 
currently - it seems that the only case where this wouldn't be a NO-OP is if 
someone set ROLE to to something else either per-database, per-user or both.

Actually, I'd like to provide an option for pg_dump and pg_restore to do that 
too (not by default, though). If people think this is a good idea, I could come 
up with a patch.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 4:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Dec 12, 2010 at 6:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-12-12 at 17:57 -0500, Tom Lane wrote:
 Huh?  It allows you to postpone the check until commit.  That's far from
 not enforcing it.

 This clearly implies that un-enforced constraints are not checked at
 commit.

 [ shrug... ]  I can't argue with you about what may or may not be in an
 unpublished draft of an unratified version of the standard, since I
 don't have a copy.  But allow me to harbor doubts that they really
 intend to allow someone to force a constraint to be considered valid
 without any verification.  This proposal strikes me as something mysql
 would do, not the standards committee.  (In particular, can a constraint
 go from not-enforced to enforced state without getting checked at that
 time?)

 Even if you're reading the draft correctly, and the wording makes it
 into a released standard, the implementation you propose would break our
 code.  The incremental FK checks are designed on the assumption that the
 constraint condition held before; they aren't likely to behave very
 sanely if the data is bad.  I'd want to see a whole lot more analysis of
 the resulting behavior before even considering an idea like this.

 Wow, you've managed to bash Simon, MySQL, and the SQL standards
 committee all in one email.

 I'm not going to argue that careful analysis isn't needed before doing
 something like this - and, in particular, if we ever get inner-join
 removal, which I'm still hoping to do at some point, a foreign key
 that isn't actually guaranteed to be valid might result in queries
 returning different answers depending on whether or not a join is
 removed.  I guess we'd have to define that as the user's problem for
 alleging a foreign-key relationship that doesn't truly exist.  On the
 other hand, there's clearly also a use case for this behavior.  If a
 bulk load of prevalidated data forces an expensive revalidation of
 constraints that are already known to hold, there's a real chance the
 DBA will be backed into a corner where he simply has no choice but to
 not use foreign keys, even though he might really want to validate the
 foreign-key relationships on a going-forward basis.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 01:05, Robert Haas napsal(a):
 This is a good idea, but I guess the question is what you do next.  If
 you know that the applicability is 100%, you can disregard the
 restriction clause on the implied column.  And if it has no
 implicatory power, then you just do what we do now.  But what if it
 has some intermediate degree of implicability?

Well, I think you've missed the e-mail from Florian Pflug - he actually
pointed out that the 'implicativeness' Heikki mentioned is called
conditional probability. And conditional probability can be used to
express the AND probability we are looking for (selectiveness).

For two columns, this is actually pretty straighforward - as Florian
wrote, the equation is

   P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

where P(B) may be estimated from the current histogram, and P(A|B) may
be estimated from the contingency (see the previous mails). And P(A and
B) is actually the value we're looking for.

Anyway there really is no intermediate degree of aplicability, it just
gives you the right estimate.

And AFAIR this is easily extensible to more than two columns, as

  P(A and B and C) = P(A and (B and C)) = P(A|(B and C)) * P(B and C)

so it's basically a recursion.

Well, I hope my statements are really correct - it's been a few years
since I gained my degree in statistics ;-)

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 8:46 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 13.12.2010 01:05, Robert Haas napsal(a):
 This is a good idea, but I guess the question is what you do next.  If
 you know that the applicability is 100%, you can disregard the
 restriction clause on the implied column.  And if it has no
 implicatory power, then you just do what we do now.  But what if it
 has some intermediate degree of implicability?

 Well, I think you've missed the e-mail from Florian Pflug - he actually
 pointed out that the 'implicativeness' Heikki mentioned is called
 conditional probability. And conditional probability can be used to
 express the AND probability we are looking for (selectiveness).

 For two columns, this is actually pretty straighforward - as Florian
 wrote, the equation is

   P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

Well, the question is what data you are actually storing.  It's
appealing to store a measure of the extent to which a constraint on
column X constrains column Y, because you'd only need to store
O(ncolumns^2) values, which would be reasonably compact and would
potentially handle the zip code problem - a classic hard case rather
neatly.  But that wouldn't be sufficient to use the above equation,
because there A and B need to be things like column X has value x,
and it's not going to be practical to store a complete set of MCVs for
column X for each possible value that could appear in column Y.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
 P(A|B) = P(A and B) / P(B).

Well, until this point we've discussed failure cases involving 'AND'
conditions. What about 'OR' conditions? I think the current optimizer
computes the selectivity as 's1+s2 - s1*s2' (at least that's what I
found in backend/optimizer/path/clausesel.c:630).

Sometimes that may return nearly 2x the actual selectivity, but in
general it's a reasonable estimate. Are there any severe failure cases
that produce much worse estimates?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Tomas Vondra
Dne 13.12.2010 03:00, Robert Haas napsal(a):
 Well, the question is what data you are actually storing.  It's
 appealing to store a measure of the extent to which a constraint on
 column X constrains column Y, because you'd only need to store
 O(ncolumns^2) values, which would be reasonably compact and would
 potentially handle the zip code problem - a classic hard case rather
 neatly.  But that wouldn't be sufficient to use the above equation,
 because there A and B need to be things like column X has value x,
 and it's not going to be practical to store a complete set of MCVs for
 column X for each possible value that could appear in column Y.

O(ncolumns^2) values? You mean collecting such stats for each possible
pair of columns? Well, I meant something different.

The proposed solution is based on contingency tables, built for selected
groups of columns (not for each possible group). And the contingency
table gives you the ability to estimate the probabilities needed to
compute the selectivity. Or am I missing something?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Andrew Dunstan



On 12/12/2010 08:27 PM, Rob Wultsch wrote:


MySQL does in fact have this feature and it is used by mysqldump. This
feature is very useful.



The trouble is that FK's have more than one use. In particular, they 
have a documentary use that's used by tools that analyze databases, as 
well as by tools like htsql. They also have a role as an enforced 
constraint.


In fact it's possible now to disable FK enforcement, by disabling the 
triggers. It's definitely a footgun though. Just the other day I was 
asked how data violating the constraint could have got into the table, 
and caused some surprise by demonstrating how easy this was to produce.


So what would actually be an advance in my view would be a mechanism 
that allowed explicit disabling of a constraint but ensured that it was 
not violated when re-enabling it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Peter Geoghegan
I wouldn't like to comment on whether or not Simon has correctly
interpreted the words of the SQL standards committee, because
standards committees sometimes word things in an intentionally
ambiguous way to placate different interests, and because it seems
fairly inconsequential in this case. IMHO this is a useful feature
that should be pursued.

There is another precedent that no one mentioned - DB2. From their docs:

You can add a foreign key with the NOT ENFORCED option to create an
informational referential constraint. This action does not leave the
table space in CHECK-pending status, and you do not need to execute
CHECK DATA.

I understand that DB2's informational referential constraints won't
ever be enforced (they just show intent, which is useful to their
planner), so this isn't really the same thing. However, DB2 apparently
doesn't initially enforce referential integrity when an FK is created
on a table with existing data, without any special syntax on the
CREATE:

DB2 does not validate the data when you add the foreign key. Instead,
if the table is populatedthe table space that contains the table
is placed in CHECK-pending status, just as if it had been loaded with
ENFORCE NO. In this case, you need to execute the CHECK DATA utility
to clear the CHECK-pending status.

If I am not mistaken, this is almost exactly the behaviour described
by Simon, because referential integrity is, presumably, enforced after
the FK is created, but before the CHECK DATA utility is optionally run
to ensure that we actually have referential integrity at a later time.
I believe that Simon's proposal is essentially sound. I don't know why
CHECK DATA operates at the tablespace granularity rather than the FK
granularity - IANADB2U.

If we followed this behaviour, we wouldn't let people just
arbitrarily claim that a referential condition exists - rather, we'd
let them assert that it /ought/ to exist, and that it will be
maintained going forward, and give them the option of verifying that
assertion at a later time, after which it actually exists.
Unfortunately, this refinement of Simon's proposal would probably
entail adding an additional column to pg_constraint.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 13.12.2010 03:00, Robert Haas napsal(a):
 Well, the question is what data you are actually storing.  It's
 appealing to store a measure of the extent to which a constraint on
 column X constrains column Y, because you'd only need to store
 O(ncolumns^2) values, which would be reasonably compact and would
 potentially handle the zip code problem - a classic hard case rather
 neatly.  But that wouldn't be sufficient to use the above equation,
 because there A and B need to be things like column X has value x,
 and it's not going to be practical to store a complete set of MCVs for
 column X for each possible value that could appear in column Y.

 O(ncolumns^2) values? You mean collecting such stats for each possible
 pair of columns? Well, I meant something different.

 The proposed solution is based on contingency tables, built for selected
 groups of columns (not for each possible group). And the contingency
 table gives you the ability to estimate the probabilities needed to
 compute the selectivity. Or am I missing something?

Well, I'm not real familiar with contingency tables, but it seems like
you could end up needing to store a huge amount of data to get any
benefit out of it, in some cases.  For example, in the United States,
there are over 40,000 postal codes, and some even larger number of
city names, and doesn't the number of entries go as O(m*n)?  Now maybe
this is useful enough anyway that we should Just Do It, but it'd be a
lot cooler if we could find a way to give the planner a meaningful
clue out of some more compact representation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Rob Wultsch
On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/12/8 Kineticode Billing da...@kineticode.com:
 On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

 Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

 EVANESCENT.

 UNSAFE ?

troll
MyISAM
/troll



-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Robert Haas
On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch wult...@gmail.com wrote:
 On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/12/8 Kineticode Billing da...@kineticode.com:
 On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

 Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

 EVANESCENT.

 UNSAFE ?

 troll
 MyISAM
 /troll

Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-12-12 Thread Fujii Masao
On Sat, Dec 11, 2010 at 11:37 PM, Robert Haas robertmh...@gmail.com wrote:
 Committed with just a few changes to the documentation.

Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BufFreelistLock

2010-12-12 Thread Jim Nasby
On Dec 10, 2010, at 10:49 AM, Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Jeff Janes's message of vie dic 10 12:24:34 -0300 2010:
 As far as I can tell, bgwriter never adds things to the freelist.
 That is only done at start up, and when a relation or a database is
 dropped.  The clock sweep does the vast majority of the work.
 
 AFAIU bgwriter runs the clock sweep most of the time (BgBufferSync).
 
 I think bgwriter just tries to write out dirty buffers so they'll be
 clean when the clock sweep reaches them.  It doesn't try to move them to
 the freelist.

Yeah, it calls SyncOneBuffer which does nothing for the clock sweep.

 There might be some advantage in having it move buffers
 to a freelist that's just protected by a simple spinlock (or at least,
 a lock different from the one that protects the clock sweep).  The
 idea would be that most of the time, backends just need to lock the
 freelist for long enough to take a buffer off it, and don't run clock
 sweep at all.

Yeah, the clock sweep code is very intensive compared to pulling a buffer from 
the freelist, yet AFAICT nothing will run the clock sweep except backends. 
Unless I'm missing something, the free list is practically useless because 
buffers are only put there by InvalidateBuffer, which is only called by 
DropRelFileNodeBuffers and DropDatabaseBuffers. So we make backends queue up 
behind the freelist lock with very little odds of getting a buffer, then we 
make them queue up for the clock sweep lock and make them actually run the 
clock sweep.

BTW, when we moved from 96G to 192G servers I tried increasing shared buffers 
from 8G to 28G and performance went down enough to be noticeable (we don't have 
any good benchmarks, so I cant really quantify the degradation). Going back to 
8G brought performance back up, so it seems like it was the change in shared 
buffers that caused the issue (the larger servers also have 24 cores vs 16). My 
immediate thought was that we needed more lock partitions, but I haven't had 
the chance to see if that helps. ISTM the issue could just as well be due to 
clock sweep suddenly taking over 3x longer than before.

We're working on getting a performance test environment setup, so hopefully in 
a month or two we'd be able to actually run some testing on this.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Kevin Grittner
Peter Geoghegan  wrote:
 
 If we followed this behaviour, we wouldn't let people just
 arbitrarily claim that a referential condition exists - rather,
 we'd let them assert that it /ought/ to exist, and that it will be
 maintained going forward, and give them the option of verifying
 that assertion at a later time, after which it actually exists.
 
What you outline would be quite valuable to our shop.  Under the
law, the custodians of the data are the elected clerks of circuit
court, and under state law and rules of the state supreme court we
can't clean up even the most glaring apparent data problems
without the OK of the elected official or his or her designee.  We
have a very complex schema (although no more complex than necessary
to model the reality of the data) with hundreds of foreign key
relationships.
 
For various reasons (conversions from old systems, etc.), these
relationships don't hold on all tables in all county databases.  It
would be desirable to have foreign key definitions define the
intended relationships anyway, and very useful for them to prevent
further data degradation.  For those situations where we get a
business analyst to work with clerk of court staff to clean up
orphaned rows, it would be very slick to be able to run some
statement (like CHECK DATA) to see if the cleanup is complete and
successful and to flag that the constraint is now enforced.
 
So +1 on what Peter outlined as current DB2 features in this regard.
 
-Kevin




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PS display and standby query conflict

2010-12-12 Thread Fujii Masao
On Sat, Dec 11, 2010 at 11:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This problem happens because ResolveRecoveryConflictWithVirtualXIDs
 resets PS display for each read-only transactions that recovery
 waits for. Why do we need to reset that each time even though
 the conflict has not been resolved yet? The attached patch
 suppresses such a needless reset. Comments?

 The reset occurs at most each 500ms, so not much problem there.

 But if it annoys you, it seems OK to change it. Don't see a reason to 
 backpatch though?

I think that It's worth backpatch to prevent users who observe the
occurrence of the query conflicts carefully for testing 9.0 from
getting confusing.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread Nathan Boley
 The proposed solution is based on contingency tables, built for selected
 groups of columns (not for each possible group). And the contingency
 table gives you the ability to estimate the probabilities needed to
 compute the selectivity. Or am I missing something?

 Well, I'm not real familiar with contingency tables, but it seems like
 you could end up needing to store a huge amount of data to get any
 benefit out of it, in some cases.  For example, in the United States,
 there are over 40,000 postal codes, and some even larger number of
 city names, and doesn't the number of entries go as O(m*n)?

Not to mention that the model parameters will be impossible to estimate well.

( I've only scanned the thread, so sorry if Im repeating something
that's already been said )

My intuition is that storing the covariance structure will be
unworkable both technically and statistically for all but the simplest
of cases. That being said, I dont think the problem is a lack of ways
to parameterize the covariance structure ( there are several papers on
mutli-dim histogram estimators, at least one of whichtalks about
databases explicitly, not to mention approaches like CART[1] ) , but a
complete lack of infrastructure to do anything with the estimates. So
keep working on it ;-) ( but try to make the framework general enough
to allow better estimators ).

I wonder if a good first step might be to focus on the AND and OR
operators since they seem like the most common cases and union and
intersection commute ( although it's important to remember that the
estimate variances do NOT )  That is, what about estimating the
selectivity of the condition

WHERE X=A and Y=B

by

f(A,B) = x_1*(selectivity(X = A) + selectivity( Y = B )) +
x_2*selectivity(X = A)*selectivity( Y = B ) + x_3

where x_{1,2,3} are parameters to be estimated from the data.

Another quick note: I think that storing the full contingency table is
wasteful since the marginals are already stored in the single column
statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
looking at this a couple years back ).

Best,
Nathan

[1] http://en.wikipedia.org/wiki/Classification_and_regression_tree
[2] http://en.wikipedia.org/wiki/Copula_%28statistics%29

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-12-12 Thread Robert Haas
On Sun, Dec 5, 2010 at 4:23 PM, Greg Smith g...@2ndquadrant.com wrote:
 Jeff Janes wrote:

 I've attached a tiny patch to apply over yours, to deal with this and
 with the case where no files are synced.


 Thanks for that.  That obvious error eluded me because in most of the patch
 update testing I was doing (on ext3), the longest sync was always about the
 same length as the total sync time.

 Attached patch (in correct diff form this time!) collects up all changes.
  That includes elimination of a potential race condition if someone changes
 log_checkpoints while a long sync phase is executing.  I don't know whether
 that can happen, and it obviously won't give accurate stats going back to
 the beginning of the checkpoint in that case, but it  tries to defend aginst
 producing complete garbage if that value changes out from under it.

 This is the first version of this patch I feel fairly good about; no open
 concerns left on my side.  Jeff, since you're now the de-facto credited
 reviewer of this one by virtue of suggesting bug fixes, could you take this
 update out for a spin too?

I took a look at this and it looks generally good, but I'm wondering
why md.c is converting the results from an exact value to a floating
point, only to have xlog.c turn around and convert back to an integer.
 I think it could just return milliseconds directly, or if you're
worried about a checkpoint that takes more than 24 days to complete,
seconds and microseconds.  Or am I missing something?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] rest of works for security providers in v9.1

2010-12-12 Thread KaiGai Kohei
I'd like to see opinions what facilities should be developed
to the current v9.1 development cycle.

We have integrated some of facilities to support a starter-
version of SE-PostgreSQL. It allows to hook controls on DML
permission checks and assign security labels of client and
database obejcts either by-hand or automatically.

On the other hand, hooks on DDL permission checks are still
future works from now. I believe object_access_hook is applied
on various kind of DDL permission checks, but we cannot complete
to put these hooks at once, because of patch scale.

So, I plan to integrate the following four facilities only in
the last commit-fest of v9.1, although it still does not cover
comprehensive DDL accesses .


* Expand object_access_hook to deliver arguments

Some of DDL hooks will need to deliver several arguments
in addition to OID of the object being modified. For example,
a flag to show whether this deletion is cascaded, or not.
So, prototype of the object_access_hook needs to be revised.

My idea is to add two arguments: an integer variable for number
of arguments and an array variable for the additional information.
Then, macros will wrap up invocation of this hook to keep
the code simple.


* Permission checks on object-prep-creation

It was not well concluded in the previous discussion, whether
two hooks are needed, or one.
I think the idea to divide creation hooks into two phases by its
role eventually enables to reduce the burden of code management.

Now we have OAT_POST_CREATE hooks just after registration of
dependency, basically. It is a simple enough basis, and quite
natural places to assign new security labels.
However, several cases shall be exceptions of the basis, if we
try to check permissions also in the post-creation hooks,
in addition to default labeling.

For example, heap_create_with_catalog() is called from five
places, but only two needs permission checks: DefineRelation()
and OpenIntoRel().
A few cases are not obvious whether we need permission checks
in this invocation, like a code path from make_new_heap().
It defines a new pg_class entry, but the external module cannot
determine from the catalog whether is is invoked on the code
path that needs permission checks, or not.

So, I want OAT_CREATE hooks being just after existing permission
checks for the purpose of access control, not default labeling.


* Permission checks on object-deletion

The existing code put permission checks of object deletion on
command handlers like RemoveRelations(), then it invokes
functions in dependency.c to drop the specified and dependent
objects (if necessary).
I think it is straight-forward to put object-deletion hooks
next to the existing permission checks. But it is unavailable
to check cascaded objects to be removed here.
So, it seems to me findDependentObjects() should be exposed to
external modules to inform what objects shall be dropped.
Unlike old SE-PostgreSQL implementation, I don't consider it
is a good idea to put hook within dependency.c, because we need
to inform dependency.c whether this deletion is by-hand, or
something internals (such as cleanups of temporary objects).


* Permission checks on misc easy implementables

Apart from the priority of development, it seems to me that we can
hook controls at the following commands quite easy. It is an idea
to put hooks that we can implement with little impact around the
existing codes.
 - GRANT/REVOKE
 - COMMENT ON
 - SECURITY LABEL

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-12 Thread Shigeru HANADA
On Sun, 12 Dec 2010 23:47:53 +0200
Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2010-12-01 at 12:30 +0900, Hitoshi Harada wrote:
  From a user's view, this is very long way to see a simplest foreign
  table. I know it is based on the standard, but I really want a
  shortcut. Especially, I don't understand why CREATE USER MAPPING FOR
  current_user SERVER server is needed for default use case. If you
  forget CREATE USER MAPPING and do CREATE FOREIGN TABLE, it raises an
  error. User mapping is useful if the local user and remote user should
  be mapped but I imagine in most cases they are the same.
  postgresql_fdw can tell the remote user by conninfo string, in
  addition.
 
 I reviewed the standard about this, and a lot of things are
 implementation-defined.  I think user mappings could be made optional.

Simple FDWs such as File FDW might not have concept of user on
remote side.  In such case, it would be enough to control access
privilege per local user with GRANT/REVOKE SELECT statement.

  This is another topic, but it would be useful if CREATE FOREIGN TABLE
  can omit column definitions since fdw usually knows what should be
  there in the definitions. I some times mistyped the column names
  between remote and local and resulted in fail on execution.
 
 Also, according to the standard, the column list in CREATE FOREIGN TABLE
 is optional (if you can get it in some automatic way, of course).

To allow omitting column definitions for that purpose, a way to create
ero-column tables would have to be provided.  New syntax which allows
FDWs to determine column definition would be necessary.

ex)
-- Create foo from the remote table foo on the server bar
CREATE FOREIGN TABLE foo SERVER bar;
-- Create zero-column table foo
CREATE FOREIGN TABLE foo () SERVER bar;

To support this feature, another hook function need to be added to FDW
API.  ISTM that this feature should be considered with IMPORT SCHEMA
statement.

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_is_in_recovery=1

2010-12-12 Thread aaliya zarrin
Hi All,

When pg_is_in_recovery in the table changes to zero(status change)??

At the time when recovery stops?
If switch over has to be done then, after receivibg the signal and telling
the postgres to run the startup process (got_SIGHUP = true),
shall we have to stop replication or wait for the recovery to get stop?
Can I change this pg_is_in_recovery = 0?

-- 
Thanks  Regards,

Aaliya Zarrin
(+91)-9160665888


Re: [HACKERS] [COMMITTERS] pgsql: Use symbolic names not octal constants for file permission flags

2010-12-12 Thread Glen Knowles
On Sun, Dec 12, 2010 at 8:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:

  What are the values of _S_IREAD and _S_IWRITE, anyway?  I'm still
  wondering how come the previous coding with hardwired constants
  behaved correctly.

 Still curious about this.

 FWIW, _S_IREAD and _S_IWRITE are defined by Visual Studio C++ 2008 in
sys/stat.h as 0x0100 and 0x0080 respectively.

Glen


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 7:24 PM, Andrew Dunstan and...@dunslane.net wrote:
 In fact it's possible now to disable FK enforcement, by disabling the
 triggers. It's definitely a footgun though. Just the other day I was asked
 how data violating the constraint could have got into the table, and caused
 some surprise by demonstrating how easy this was to produce.

Ugh. I have read the entire pg manual and I did not recall that
footgun.  At least in MySQL disabling fk's is explicit. There is
something to be said for being able to tell the database: Hey, hold
my beer and watch this, it might be stupid but it is what we are going
to do. The database telling it's user that is a much larger issue
(and yes, MySQL is generally worse). The user at least gets to talk to
db through sql, the database only really gets to talk to the user
through errors and the manual.

The fact that fk checks are implemented by the trigger system somehow
seems surprising.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-12 Thread Rob Wultsch
On Sun, Dec 12, 2010 at 7:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Dec 12, 2010 at 9:31 PM, Rob Wultsch wult...@gmail.com wrote:
 On Fri, Dec 10, 2010 at 5:34 PM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 2010/12/8 Kineticode Billing da...@kineticode.com:
 On Dec 8, 2010, at 10:37 AM, Chris Browne wrote:

 Other possibilities include TRANSIENT, EPHEMERAL, TRANSIENT, TENUOUS.

 EVANESCENT.

 UNSAFE ?

 troll
 MyISAM
 /troll

 Heh.  But that would be corrupt-on-crash, not truncate-on-crash, no?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

troll
Yep. Truncate-on-shutdown MySQL options are the MEMORY and PBXT (using
the memory resident option).
/troll

I like TRANSIENT but wonder if MEMORY might be more easily understood by users.
-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-12 Thread Itagaki Takahiro
On Sat, Dec 11, 2010 at 05:30, Andrew Dunstan and...@dunslane.net wrote:
 On 12/04/2010 11:11 PM, Itagaki Takahiro wrote:
 One exports the copy functions from the core, and another
 implements file_fdw using the infrastructure.

 Who is actually going to do this split?

I'm working for it :-)  I extract those functions from copy.c:

- CopyState BeginCopyFrom(Relation rel, const char *filename,
  List *attnamelist, List *options);
- void EndCopyFrom(CopyState cstate);
- bool NextCopyFrom(CopyState cstate,
Datum *values, bool *nulls, Oid *oid);

There was Reset() in file_fdw, but it is not contained in the
patch. It will be added again if required, but I wonder we might
need not only reset but also mark/restore a position in a file.

-- 
Itagaki Takahiro


copy_export-20101213.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
Hi,

pg_archivecleanup removes unnecessary WAL files from the archive, but not
from pg_xlog directory. So, after failover, those WAL files might
exist in pg_xlog
and be archived again later. Re-archiving of unnecessary WAL files seems odd
to me. To avoid this problem, how about changing pg_archivecleanup so that
it removes WAL files also in pg_xlog or creates .done file in
archive_status when
removing them from the archive?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
On Mon, Dec 13, 2010 at 3:44 PM, Fujii Masao masao.fu...@gmail.com wrote:
 pg_archivecleanup removes unnecessary WAL files from the archive, but not
 from pg_xlog directory. So, after failover, those WAL files might
 exist in pg_xlog
 and be archived again later. Re-archiving of unnecessary WAL files seems odd
 to me. To avoid this problem, how about changing pg_archivecleanup so that
 it removes WAL files also in pg_xlog or creates .done file in
 archive_status when
 removing them from the archive?

Well, we can avoid this problem by specifying pg_xlog directory instead of
the archive in recovery_end_command:

recovery_end_command = 'pg_archivecleanup pg_xlog %r'

Though this sounds like somewhat bad know-how..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: auxiliary functions for record type

2010-12-12 Thread Pavel Stehule
2010/12/12 Florian Pflug f...@phlo.org:
 On Dec12, 2010, at 00:19 , Pavel Stehule wrote:
 I prefer a table based
 solution, because I don't need a one unnest, but other preferences
 are valid too.
 That's fine with me.

 I dissatisfied with your design of explicit target type
 via unused value.  I think, so we are not a infrastructure for it now
 - from my view is better to use a common type, that is text now. It's
 nothing new - plpgsql use it too.
 Sorry, I can't follow you here. Where does plpgsql use text as common type?

 I see one well design of explicit target type based on polymorphic
 types that respect a PostgreSQL fmgr practice:

 We have to allow a polymorphic functions without polymorphic
 parameters. These functions shoud be designed to return value in
 unknown type format when this function has not outer information.
 I don't think unknown is the right type for that. As far as I known, 
 unknown is still a textual type, used to have some type to assign to string 
 literals during parsing when no better type can be inferred.

 This information can be passed in function context. When function
 context isn't null, then function has to read target type and should
 to return value in target type. Who can fill a function context? It is
 task for executor. And when CAST contains just function call, then we
 can recheck, if function is polymorphic, and if it is, then we can set
 function context to target type, and then we don't need to call a
 conversion function, because polymorphic function must returns data in
 correct format.
 The main difficulty is that currently types are assigned in a bottom-up 
 fashion as far as I know. To make functions with a polymorphic return value, 
 but without polymorphic arguments work, you need to assign the return type in 
 a top-down fashion (It depends on where to value *goes*, not where it *comes 
 from*). That seems like a rather huge change and has the potential to 
 complicate quite a few other parts, most notably function lookup/resolution.

 Plus, the general case where type information must bubble up more than one 
 level seems pretty much intractable, as it'd require a full-blown type 
 inference algorithm like ML or Haskell. Not a place where we want to go, I 
 believe.

 The restricted case, on the other hand, brings very little benefit compared 
 to the dummy-parameter approach. Yeah, polymorphic function()::type may 
 look a bit cleaner than polymorphic function(NULL::type), but thats about 
 is. It's only assignments in pl/pgsql which really benefit, since you'd be 
 able to leave out the type completely, writing simply v_value := 
 polymorphic_function(). Does that really warrant the effort that'd be 
 involved?

There is a second possibility - and hardly simpler. We can use a
specialised statement with own parser/executor node. Then
implementation should be really simply

syntax:

EXTRACT_VALUE(expr1 FROM expr2 AS typename) ... RETURNS typename

expr1 ... result must be converted to text .. fieldname
expr2 ... result must be composite type

disadvantage - EXTRACT_VALUE must be a keyword
advantage - simple implementation, available for all environments, readable

var := EXTRACT_VALUE('f1' FROM myrec AS int);

note: name for this statement isn't important now, can be EXTRACT_FIELD, ...

comments, ideas?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Heikki Linnakangas

On 13.12.2010 08:44, Fujii Masao wrote:

pg_archivecleanup removes unnecessary WAL files from the archive, but not
from pg_xlog directory. So, after failover, those WAL files might
exist in pg_xlog and be archived again later.


A file that has already been archived successfully should not be 
archived again. The server keeps track of which files it has already 
archived with the .ready/.done files.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-12 Thread tv
 On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Dne 13.12.2010 03:00, Robert Haas napsal(a):
 Well, the question is what data you are actually storing.  It's
 appealing to store a measure of the extent to which a constraint on
 column X constrains column Y, because you'd only need to store
 O(ncolumns^2) values, which would be reasonably compact and would
 potentially handle the zip code problem - a classic hard case rather
 neatly.  But that wouldn't be sufficient to use the above equation,
 because there A and B need to be things like column X has value x,
 and it's not going to be practical to store a complete set of MCVs for
 column X for each possible value that could appear in column Y.

 O(ncolumns^2) values? You mean collecting such stats for each possible
 pair of columns? Well, I meant something different.

 The proposed solution is based on contingency tables, built for selected
 groups of columns (not for each possible group). And the contingency
 table gives you the ability to estimate the probabilities needed to
 compute the selectivity. Or am I missing something?

 Well, I'm not real familiar with contingency tables, but it seems like
 you could end up needing to store a huge amount of data to get any
 benefit out of it, in some cases.  For example, in the United States,
 there are over 40,000 postal codes, and some even larger number of
 city names, and doesn't the number of entries go as O(m*n)?  Now maybe
 this is useful enough anyway that we should Just Do It, but it'd be a
 lot cooler if we could find a way to give the planner a meaningful
 clue out of some more compact representation.

Yes, storing a complete contingency table is not feasible in such cases.
My original proposal actually did not address this particular issue
(cities and ZIP codes) as it was based on simplified contingency tables
(with bins corresponding to current histograms, not to individual values).
So the number of values to store would grow much slower.

On the other hand, this generalization really makes it unusable in some
cases, and the issue we're discussing here (cities and ZIP codes) is one
of them. I think in such cases we could build a contingency table for MCV
and then use it to estimate those conditional probabilities we need, but I
expect it to be very tricky.

Thanks for the comments.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED

2010-12-12 Thread Simon Riggs
On Sun, 2010-12-12 at 19:07 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  ...  On the
  other hand, there's clearly also a use case for this behavior.  If a
  bulk load of prevalidated data forces an expensive revalidation of
  constraints that are already known to hold, there's a real chance the
  DBA will be backed into a corner where he simply has no choice but to
  not use foreign keys, even though he might really want to validate the
  foreign-key relationships on a going-forward basis.
 
 There may well be a case to be made for doing this on grounds of
 practical usefulness.  I'm just voicing extreme skepticism that it can
 be supported by reference to the standard.
 
 Personally I'd prefer to see us look into whether we couldn't arrange
 for low-impact establishment of a verified FK relationship, analogous to
 CREATE INDEX CONCURRENTLY.  We don't let people just arbitrarily claim
 that a uniqueness condition exists, and ISTM that if we can handle that
 case we probably ought to be able to handle FK checking similarly.

I think we should do *both* things. Sometimes you already know the check
will pass, sometimes you don't. In particular, reloading data from
another source where you knew the checks passed. Enforcing re-checking
in that case reduces data availability.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] hstores in pl/python

2010-12-12 Thread Jan Urbański
It would be cool to be able to transparently use hstores as Python
dictionaries and vice versa. It would be easy enough with hstore as a
core type, but with hstore as an addon it's not that easy.

There was talk about including hstore in core, is there still chance for
that to happen in 9.1? I'd like to include hstore-dict handling, but
with hstore out-of-core the only half-sane way I see is:
 * hack PL/Python's makefile to add -Icontrib/hstore (yuck!)
 * create an extension module for Python that knows how to handle
hstores that would live next to plpython.so
 * install it in $libdir on make install
 * when PL/Python receives or is asked to create an hstore, load the
extension module and use it to parse the value (ugly, probably slow)
 * the module would also have to make sure hstore.so is loaded in the
database, which in itself is not pretty, as it would refer to
hstore_in/out symbols

I wrote a module that can be used with current PL/Python to simplify
hstore handling (https://github.com/wulczer/pyhstore), but it suffers
from most of the aforementioned problems, and on top of that you get
hstore-text-dict instead of just hstore-dict, which sucks.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_archivecleanup should remove WAL files also in pg_xlog?

2010-12-12 Thread Fujii Masao
On Mon, Dec 13, 2010 at 4:28 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 13.12.2010 08:44, Fujii Masao wrote:

 pg_archivecleanup removes unnecessary WAL files from the archive, but not
 from pg_xlog directory. So, after failover, those WAL files might
 exist in pg_xlog and be archived again later.

 A file that has already been archived successfully should not be archived
 again. The server keeps track of which files it has already archived with
 the .ready/.done files.

This seems to require

* archiver to save the last archived WAL file name in the shmem
* walsender to send it to walreceiver
* walreceiver to create .done file when it's arrived
* bgwriter not to remove WAL files which don't have .done file in standby

Right? One good side effect of this is that we can prevent WAL files from
being removed from the standby before the master archives them.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers