date:20101229

On Tue, Dec 28, 2010 at 17:43, Guillaume Lelarge guilla...@lelarge.info wrote:
 Le 28/12/2010 17:36, Tom Lane a écrit :
 Guillaume Lelarge guilla...@lelarge.info writes:
 Le 28/12/2010 16:34, Tom Lane a écrit :
 1. It'll have to be restricted to superusers, therefore ordinary
 users on the slave can't actually make use of it.


 pgAdmin's users usually connect as superusers.

It would be a function for DBAs, of course. I don't see why normal
users would be intersted in it, really.


 2. It's not what you want, since you don't want to connect as the
 replication user.  Therefore, you'd have to start by parsing out
 the parts you do need.  Expecting every client to include conninfo
 parsing logic doesn't seem cool to me.

 I can see the point of, say, a primary_host_address() function returning
 inet, which would be way better on both those dimensions than the
 current proposal.  But I'm not sure what else would be needed.


 Yeah, it would be better that way. I'm actually interested in Magnus's
 patch because, during 9.0 development phase, I had in mind to parse the
 primary_conninfo till I found I could not get this value with SHOW or
 current_setting().

 But, actually, what I really need is host and port. This way, I could
 connect to the master node, with the same user and password that was
 used on the slave node.

I agree it might well be more useful to have it split up for us. We'd
need the host name (though it would have to be text and not inet,
since we'd need the unix socket path for a local connection) and port.
And username. But certainly not password, and probably none of the
other parameters.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_primary_conninfo

On Tue, Dec 28, 2010 at 18:12, Robert Haas robertmh...@gmail.com wrote:
 On Dec 28, 2010, at 10:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm still wondering what's the actual use-case for exposing this inside
 SQL.  Those with a legitimate need-to-know can look at the slave
 server's config files, no?

 SQL access is frequently more convenient, though.

Yes. Reading it in the files does not scale with $LOTS of servers, be
them slaves or masters or both. You can't assume that people have
direct filesystem access to the server (or at least it's data
directory) - particularly when the organisation is large enough that
you have different teams running the db's and the OS's, not to mention
when you have some on-call group who verifies the things in the middle
of the night...


Unless you mean reading them with pg_read_file() and then parsing it
manually, but that just requires everybody to re-invent the wheel we
already have in the parser.


 Although maybe now that we've made recovery.conf use the GUC lexer we oughta 
continue in that vein and expose those parameters as PGC_INTERNAL GUCs rather 
than inventing a new function for it...

That's definitely another option that I wouldn't object to if people
prefer that way.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_primary_conninfo


On 29.12.2010 10:36, Magnus Hagander wrote:

On Tue, Dec 28, 2010 at 18:12, Robert Haasrobertmh...@gmail.com  wrote:

  Although maybe now that we've made recovery.conf use the GUC lexer we oughta 
continue in that vein and expose those parameters as PGC_INTERNAL GUCs rather 
than inventing a new function for it...


That's definitely another option that I wouldn't object to if people
prefer that way.


I recall from previous discussions that we have a consensus that we 
should unite recovery.conf and postgresql.conf, so that they're all GUCs 
and you can put all the settings in postgresql.conf. Let's do that.


http://archives.postgresql.org/pgsql-hackers/2010-10/msg00033.php

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either


On 29.12.2010 06:54, Robert Haas wrote:

  With the patch:

rhaas=# cluster v;
ERROR:  views do not support CLUSTER


do not support sounds like a missing feature, rather than a 
nonsensical command. How about something like CLUSTER cannot be used on 
views


The patch changes a bunch of heap_openrv() calls to relation_openrv(). 
Perhaps it would be better make the error message something like \%s\ 
is not a table, and keep the callers unchanged. It's not particularly 
useful to repeat the command in the error message, the user should know 
what command he issued. Even if it's buried deep in a PL/pgSQL function 
or something, it should be clear from the context lines.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] writable CTEs

2010-12-29 Thread Yeb Havinga


On 2010-12-29 09:16, Mark Kirkwood wrote:

On 29/12/10 03:35, Peter Eisentraut wrote:

On tis, 2010-12-28 at 00:19 +, Peter Geoghegan wrote:

It's worth noting that officially (i.e. in the docs), we don't even
call CTEs CTEs at any point. We call them WITH queries. I think that
that's a mistake because we call them CTEs everywhere else.

I think WITH query or WITH clause is more understandable than CTE,
which to me is a term that has no relationship with anything else.




Peter's comment certainly resonates with me. When I first heard about 
this CTE business I had to go to the web to discover that they were 
components of the WITH clause - which I was familiar with from my DB2 
days...
For me it was the converse.. I first heard of Common Table Expressions 
from SQLserver users, at roughly the same time that CTE's were 
introduced in 8.4. When I decided to use them, it took me a while to 
figure out the docs refered to it as WITH queries.


ISTM we're already past the choice to have a single name. IMHO it would 
be best if the documentation has a reference / index part in which both 
WITH queries and Common Table Expressions (CTE) are listed.


Also, the terms CTE and CTEScan appear in EXPLAIN output, it would be 
nice to have a meaningful hit when looking for the term in the 
documentation page, instead of 'Your search for *cte* returned no hits.'


regards,
Yeb Havinga

Re: [HACKERS] Streaming replication as a separate permissions

On Tue, Dec 28, 2010 at 13:05, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Dec 27, 2010 at 22:53, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Dec 27, 2010 at 22:42, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Updated patch, still pending docs, but otherwise updated: allow
 start/stop backup, make sure only superuser can turn on/off the flag,
 include in system views, show properly in psql.

 I'd suggest avoiding creating the static cache variable
 AuthenticatedUserIsReplicationRole.  This can't possibly be sufficiently
 interesting from a performance point of view to justify the risks
 associated with stale cache values.  Just look up the pg_authid syscache
 entry when needed, ie, treat it more like rolcreaterole than rolsuper.

 Sure, I catually had it that way first. But doing it this way was less
 code. But I realize I should've revisited that decision when I made
 the change to pg_start_backup and pg_stop_backup - before that the
 checks would only happen during a very short window of time at the
 start of the connection, but now it can happen later..


 BTW, you forgot pg_dumpall support.

 Gah. I knew that, but somehow dropped it from my TODO. Thanks for the 
 reminder!

 Ok, here's an updated patch that does both these and includes
 documentation and regression test changes. With that, I think we're
 good to go.

I've applied this version (with some minor typo-fixes).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_streamrecv for 9.1?

Would people be interested in putting pg_streamrecv
(http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
9.1? I think it would make sense to do so.

It could/should then also become the default tool for doing
base-backup-over-libpq, assuming me or Heikki (or somebody else)
finishes off the patch for that before 9.1. We need a tool for that of
some kind if we add the functionality, after all...

What do people think - is there interest in that, or is it better off
being an outside tool?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 29.12.2010 06:54, Robert Haas wrote:

  With the patch:

 rhaas=# cluster v;
 ERROR:  views do not support CLUSTER

 do not support sounds like a missing feature, rather than a nonsensical
 command. How about something like CLUSTER cannot be used on views

I'm fine with flipping the ordering around.  I think I like it
marginally better this way, but you and Tom both seem to prefer the
opposite ordering, ergo so be it (barring a sudden influx of contrary
votes).

 The patch changes a bunch of heap_openrv() calls to relation_openrv().
 Perhaps it would be better make the error message something like \%s\ is
 not a table, and keep the callers unchanged. It's not particularly useful
 to repeat the command in the error message, the user should know what
 command he issued. Even if it's buried deep in a PL/pgSQL function or
 something, it should be clear from the context lines.

Did you read the whole thread?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

On Wed, Dec 29, 2010 at 5:47 AM, Magnus Hagander mag...@hagander.net wrote:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

 It could/should then also become the default tool for doing
 base-backup-over-libpq, assuming me or Heikki (or somebody else)
 finishes off the patch for that before 9.1. We need a tool for that of
 some kind if we add the functionality, after all...

 What do people think - is there interest in that, or is it better off
 being an outside tool?

+1 for including it.  If it's reasonably mature, +1 for bin rather than contrib.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] writable CTEs

On Wed, Dec 29, 2010 at 4:19 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Also, the terms CTE and CTEScan appear in EXPLAIN output, it would be nice
 to have a meaningful hit when looking for the term in the documentation
 page, instead of 'Your search for cte returned no hits.'

This is an excellent point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-29 Thread Greg Smith


Erik Rijkers wrote:

I get some whitespace-warnings, followed by error:

$  git apply 
/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch
/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch:481:
 trailing
whitespace.

/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch:482:
 trailing
whitespace.
if (IsA(plan, ModifyTable) 
/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch:550:
 trailing
whitespace.
/*print the action qual*/
/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch:556:
 trailing
whitespace.
(act_plan-operation == CMD_INSERT ||
/home/rijkers/download/pgpatches/0091/merge/20101206/merge_204_2010DEC06.patch:560:
 trailing
whitespace.

error: patch failed: src/backend/optimizer/plan/planner.c:739
error: src/backend/optimizer/plan/planner.c: patch does not appl


Maybe I'm doing something wrong, but I've never had good luck with git 
apply.  I took this patch and applied it the 12/15 copy of HEAD I had 
checked out (trying to minimize drift in there since the patch was 
created) using:


patch -p 1  merge_204_2010DEC06.patch

There was one trivial conflict it produced 
src/backend/optimizer/plan/planner.c.rej for, and that fix was 
straightforward to apply by hand.


The result is now sitting as the merge204 branch in my github repo: 
https://github.com/greg2ndQuadrant/postgres/tree/merge204 if you did 
want to try this out.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-29 Thread Euler Taveira de Oliveira


Em 29-12-2010 07:47, Magnus Hagander escreveu:

Would people be interested in putting pg_streamrecv
(http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
9.1? I think it would make sense to do so.


+1 but...


It could/should then also become the default tool for doing
base-backup-over-libpq, assuming me or Heikki (or somebody else)
finishes off the patch for that before 9.1.

I think that the base backup feature is more important than simple streaming 
chunks of the WAL (SR already does this). Talking about the base backup over 
libpq, it is something we should implement to fulfill people's desire that 
claim an easy replication setup.


IIRC, Dimitri already coded a base backup over libpq tool [1] but it is 
written in Python.



[1] https://github.com/dimitri/pg_basebackup/


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-29 Thread Greg Smith

I did some basic testing of the latest update here, but quickly hit a 
problem that wasn't in the previous version.  Attached is the standalone 
test script that used to work, but now fails like this:


psql:simple.sql:12: ERROR:  the vars in merge action tlist of qual 
should only belongs to the source table or target table


This test case is intended to implement the common UPSERT situation that 
is one of the main requests that MERGE is intended to satisfy, using 
this syntax:


MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

If you can suggest an alternate way to express this that works with the 
new patch, I might switch to that and retry.  I was never 100% sure this 
was the right way to write this, and I don't have another database with 
MERGE support here to try against.  (Aside:  if someone else does, I'd 
be really curious to see if the attached test case works or not on 
another database system.  I think we need to include compatibility 
testing with other MERGE implementations into the test mix here soon.)


Regardless, this failure suggests that you need to add this sort of test 
to the regression test set.  We need to have an example of an UPSERT 
using constant data in there to make sure this continues to work in the 
future.


This is a good week for me in terms of having time for PostgreSQL 
hacking, so if you can suggest something here or update the patch I'll 
try it soon afterwards.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us


DROP TABLE Stock;
CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id; 

MERGE INTO Stock t
 USING (VALUES(10,100)) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ; 

SELECT * FROM Stock ORDER BY item_id; 

MERGE INTO Stock t
 USING (VALUES(30,2000)) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ; 

SELECT * FROM Stock ORDER BY item_id; 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

On Mon, Dec 27, 2010 at 3:58 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I think the answer is that only the first column comparison is
 optimised. Second and subsequent comparisons are not optimised.

What sort of optimization are you referring to here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

On Tue, Dec 28, 2010 at 5:13 AM, Jie Li jay23j...@gmail.com wrote:
 Hi,

 Please see the following plan:

 postgres=# explain select * from small_table left outer join big_table using
 (id);
  QUERY PLAN
 
  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
    Hash Cond: (small_table.id = big_table.id)
    -  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
    -  Hash  (cost=59142.00..59142.00 rows=410 width=8)
  -  Seq Scan on big_table  (cost=0.00..59142.00 rows=410
 width=8)
 (5 rows)

 Here I have a puzzle, why not choose the small table to build hash table? It
 can avoid multiple batches thus save significant I/O cost, isn't it?

Yeah, you'd think.  Can you post a full reproducible test case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Simon Riggs

On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
 
  Here I have a puzzle, why not choose the small table to build hash table? It
  can avoid multiple batches thus save significant I/O cost, isn't it?
 
 Yeah, you'd think.  Can you post a full reproducible test case?

It's not a bug, that's the way it currently works. We don't need a test
case for that.

I agree that the optimisation would be a useful one.

It allows you to ask the query Show me sales for each of my stores
efficiently, rather than being forced to request the inner join query
Show me the sales for each of my stores for which there have been
sales, which is a much less useful query.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

On Wed, Dec 29, 2010 at 13:03, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 Em 29-12-2010 07:47, Magnus Hagander escreveu:

 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

 +1 but...

 It could/should then also become the default tool for doing
 base-backup-over-libpq, assuming me or Heikki (or somebody else)
 finishes off the patch for that before 9.1.

 I think that the base backup feature is more important than simple streaming
 chunks of the WAL (SR already does this). Talking about the base backup over
 libpq, it is something we should implement to fulfill people's desire that
 claim an easy replication setup.

Yes, definitely. But that also needs server side support.


 IIRC, Dimitri already coded a base backup over libpq tool [1] but it is
 written in Python.

Yeah, the WIP patch heikki posted is simliar, except it uses tar
format and is implemented natively in the backend with no need for
pl/pythonu to be installed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
 On Tue, Dec 28, 2010 at 5:13 AM, Jie Li jay23j...@gmail.com wrote:
  Hi,
 
  Please see the following plan:
 
  postgres=# explain select * from small_table left outer join big_table using
  (id);
   QUERY PLAN
  
   Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
     Hash Cond: (small_table.id = big_table.id)
     -  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
     -  Hash  (cost=59142.00..59142.00 rows=410 width=8)
   -  Seq Scan on big_table  (cost=0.00..59142.00 rows=410
  width=8)
  (5 rows)
 
  Here I have a puzzle, why not choose the small table to build hash table? It
  can avoid multiple batches thus save significant I/O cost, isn't it?
 
 Yeah, you'd think.  Can you post a full reproducible test case?

Also, what version is this?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-29 Thread Marko Tiikkaja


On 2010-12-29 2:14 PM, Greg Smith wrote:

MERGE INTO Stock t
  USING (VALUES(10,100)) AS s(item_id,balance)
  ON s.item_id=t.item_id
  WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
  WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
  ;

If you can suggest an alternate way to express this that works with the
new patch, I might switch to that and retry.  I was never 100% sure this
was the right way to write this, and I don't have another database with
MERGE support here to try against.


As far as I can tell, this should work.  I played around with the patch 
and the problem seems to be the VALUES:


INTO Stock t
 USING (SELECT 30, 2000) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ;
MERGE 1



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In some
cases, we can determine that doing so is unhelpful, and that the conversion
shall always succeed:

CREATE DOMAIN loosedom AS text;
CREATE TABLE t (c varchar(2));
ALTER TABLE t ALTER c TYPE varchar(4);
ALTER TABLE t ALTER c TYPE text;
ALTER TABLE t ALTER c TYPE loosedom;

In other cases, we can determine that the rewrite is unhelpful, but a cast could
still throw an error:

CREATE DOMAIN tightdom AS text CHECK (value LIKE '%/');
CREATE TABLE t (c text);
ALTER TABLE t ALTER c TYPE xml USING c::xml;
ALTER TABLE t ALTER c TYPE varchar(64);
ALTER TABLE t ALTER c TYPE tightdom;

I wish to replace table rewrites with table verification scans where possible,
then skip those verification scans where possible.  I've noted three subproblems
that require distinct treatment.  In the following discussion, tuple is the
tuple on disk, and tuple' is tuple we would form during a rewriting ALTER
TABLE ALTER TYPE.  Likewise col and col' for individual columns.


1. The table heap rewrite itself

To legitimately leave the table heap intact, we must confirm datumIsEqual(col,
col') for every tuple and column.  Without any specific configuration to suggest
when this might be true, we can always determine its truth through a
_verification scan_ of the table.  If the datumIsEqual test fails for any tuple
during the verification scan, terminate the scan and proceed with a table
rewrite.  When should we attempt the verification scan in the first place, as
opposed to proceeding directly to a rewrite?  For some time, I thought it best
to automatically attempt it when the type change degenerates to a single binary
coercion.  Cases like the text-xml and timestamp-timestamptz [timezone=UTC]
are more difficult to identify mechanically, so I supposed to let the user
identify them via a new keyword to ALTER TABLE ALTER TYPE.  Having thought on it
more, though, it actually seems best to attempt the verification scan *every*
time.  In most ineligible conversions, an inequality will appear very early, so
the scan is effectively O(1) in the negative case.  A notable exception is
something like char(6)-varchar(6) in a table with a billion tuples having
length(col) = 6 and one with length(col) = 5.  The verification scan might read
most of the table before finding the one tuple that forces a rewrite.  That
isn't a particularly regular scenario in my experience, so the just do the
right thing aspect of preceding every potential rewrite with a verification
scan seems to win out.

Certain very popular type changes (see introduction) can be _exempt_ from the
verification scan: we can determine that they will always succeed.  To capture
that, I propose extending CREATE CAST with the notion of an exemptor function:

CREATE CAST (source_type AS target_type)
{ WITH FUNCTION function_name (argument_type [, ...])
  [ WITH EXEMPTOR function_name ] |
  WITHOUT FUNCTION |
  WITH INOUT }
[ AS ASSIGNMENT | AS IMPLICIT ]

The exemptor shall have this signature:

exemptor_func(
integer, -- source_typmod
integer  -- dest_typmod
) RETURNS boolean

The exemptor shall return true iff datumIsEqual(x, x::target_type(dest_typmod))
for every x in source_type most recently coerced to source_type(source_typmod).
When the cast is WITHOUT FUNCTION (a binary coercion) and target_type lacks a
length coercion cast, the cast has an implicit exemption, and an exemptor is
superfluous: code can assume an exemptor that always returns true.  Use of WITH
EXEMPTOR mainly makes sense alongside multi-arg WITH FUNCTION (should probably
raise an error upon other uses).

When ATPrepAlterColumnType determines that a transformation expression is merely
a series of casts (or perhaps some stronger condition), and every such cast has
an exemptor or implicit exemption, it will call those exemptors with the old and
new typmods.  If the exemptors all return true and this continues to hold for
all other ALTER TYPE subcommands of this ALTER TABLE, we will skip the
verification scan and table rewrite.

I will add exemptors to length coercion casts of types other than character
and bit, which are exempt only in the no-change case.  The candidates that are
not pure length coercions, int8-bit and int4-bit, would not benefit.


2. CHECK constraints, index predicates, and expression indexes

CHECK constraints, index predicates, and indexed expressions can change behavior
without limitation when the types of columns they reference change.  An exempt
typmod change alone will not affect them.  While there are some other safe cases
(CHECK (length(col) = 2, over a varchar-text conversion), I don't have a
general framework in mind for identifying them.  For now, an exempt change that
includes a type change, such as varchar(2)-text, will still require a
verification scan whenever a CHECK constraint, index predicate, or indexed
expression references the column.  The verification scan will confirm

Re: [HACKERS] small table left outer join big table

On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
 
  Here I have a puzzle, why not choose the small table to build hash table? 
  It
  can avoid multiple batches thus save significant I/O cost, isn't it?

 Yeah, you'd think.  Can you post a full reproducible test case?

 It's not a bug, that's the way it currently works. We don't need a test
 case for that.

 I agree that the optimisation would be a useful one.

 It allows you to ask the query Show me sales for each of my stores
 efficiently, rather than being forced to request the inner join query
 Show me the sales for each of my stores for which there have been
 sales, which is a much less useful query.

Oh, you're right.  I missed the fact that it's a left join.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Aidan Van Dyk

On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com wrote:

description of split stuff

So, how different (or not) is this to the directory format that was
coming out of the desire of a parallel pg_dump?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Joel Jacobson

2010/12/29 Aidan Van Dyk ai...@highrise.ca

 On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com
 wrote:

 description of split stuff

 So, how different (or not) is this to the directory format that was
 coming out of the desire of a parallel pg_dump?


Not sure what format you are referring to? Custom, tar or plain text?
I noticed there are two undocumented formats as well, append and file.
I tried both of these undocumented formats, but it did not procude any
directory structure of the dumped objects.

Could you please explain how to use the directory format is such a format
already exists?
I can't find it in the documentation nor the source code of HEAD.




 a.

 --
 Aidan Van Dyk Create like a
 god,
 ai...@highrise.ca   command like a
 king,
 http://www.highrise.ca/   work like a
 slave.




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-29 Thread Gurjeet Singh

On Wed, Dec 29, 2010 at 5:09 AM, Magnus Hagander mag...@hagander.netwrote:

  Ok, here's an updated patch that does both these and includes
  documentation and regression test changes. With that, I think we're
  good to go.

 I've applied this version (with some minor typo-fixes).


Do you think we could have worded these a bit better

entryPrepare for performing on-line backup (restricted to superusers or
replication roles)/entry

to say 'restricted to superusers _and_ replication roles'.

Saying 'restricted to superusers _or_ replication roles' may mean that at
any time we allow one or the other, but not both (reader might assume that
decision is based on some other GUC).

Using 'and' would mean that we allow it for both of those roles.

Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix?
AIUI, that suffix is used in gram.y to tag a token to mean it belongs to
Parser, and to avoid conflict with the same token elsewhere; NULL_P is a
good example.

In pg_authid.h, 8 spaces used between 'bool' and 'rolreplication', instead
tabs should have been used as the surrounding code.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Gurjeet Singh

On Wed, Dec 29, 2010 at 8:31 AM, Joel Jacobson j...@gluefinance.com wrote:



 2010/12/29 Aidan Van Dyk ai...@highrise.ca

 On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com
 wrote:

 description of split stuff

 So, how different (or not) is this to the directory format that was
 coming out of the desire of a parallel pg_dump?


 Not sure what format you are referring to? Custom, tar or plain text?
 I noticed there are two undocumented formats as well, append and file.
 I tried both of these undocumented formats, but it did not procude any
 directory structure of the dumped objects.

 Could you please explain how to use the directory format is such a format
 already exists?
 I can't find it in the documentation nor the source code of HEAD.


It is still being discussed as a patch to pg_dump. Google for directory
archive format for pg_dump, specifically in archives.postgresql.org.

AFAIK, that applies to parallel dumps of data (may help in --schema-only
dumps too), and what you are trying is for schema.

Regards
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Aidan Van Dyk

On Wed, Dec 29, 2010 at 9:11 AM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Wed, Dec 29, 2010 at 8:31 AM, Joel Jacobson j...@gluefinance.com wrote:


 2010/12/29 Aidan Van Dyk ai...@highrise.ca

 On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com
 wrote:

 description of split stuff

 So, how different (or not) is this to the directory format that was
 coming out of the desire of a parallel pg_dump?

 Not sure what format you are referring to? Custom, tar or plain text?
 I noticed there are two undocumented formats as well, append and file.
 I tried both of these undocumented formats, but it did not procude any
 directory structure of the dumped objects.
 Could you please explain how to use the directory format is such a
 format already exists?
 I can't find it in the documentation nor the source code of HEAD.

 It is still being discussed as a patch to pg_dump. Google for directory
 archive format for pg_dump, specifically in archives.postgresql.org.

Specifically:
Message-ID: aanlktimueltxwrsqdqnwxik_k1y3ych1u-9nghzqp...@mail.gmail.com

 AFAIK, that applies to parallel dumps of data (may help in --schema-only
 dumps too), and what you are trying is for schema.

Right, but one of the things it does is break the dump in to parts,
and put them in a directory/file organization.

Both are doing it for different reasons, but doing pretty much the
same thing.  But can the layout/organization of Joachim's patch can be
made human friendly in the vein of Joel's vision?

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

On Wed, Dec 29, 2010 at 15:05, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Wed, Dec 29, 2010 at 5:09 AM, Magnus Hagander mag...@hagander.net
 wrote:

  Ok, here's an updated patch that does both these and includes
  documentation and regression test changes. With that, I think we're
  good to go.

 I've applied this version (with some minor typo-fixes).


 Do you think we could have worded these a bit better

 entryPrepare for performing on-line backup (restricted to superusers or
 replication roles)/entry

 to say 'restricted to superusers _and_ replication roles'.

 Saying 'restricted to superusers _or_ replication roles' may mean that at
 any time we allow one or the other, but not both (reader might assume that
 decision is based on some other GUC).

Uh, not sure, actually. I would read the and as meaning you needed
*both*, which isn't true. We do allow, at any time, one or the other -
*or* both.


 Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix?

Um, I just copied it off a similar entry elsewhere. I saw no comment
about what _P actually means, and I can't say I know. I know very
little about the bison files :-)

 AIUI, that suffix is used in gram.y to tag a token to mean it belongs to
 Parser, and to avoid conflict with the same token elsewhere; NULL_P is a
 good example.

 In pg_authid.h, 8 spaces used between 'bool' and 'rolreplication', instead
 tabs should have been used as the surrounding code.

Bleh. Well, pgindent will fix that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

On Tue, Dec 28, 2010 at 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Dec 28, 2010 at 13:18, Robert Haas robertmh...@gmail.com wrote:
 Adding a #define to our headers that you can test for seems like the way to 
 go.

 That's kind of what I was going for ;)

 I don't see the point.  You're going to need a *run time* test on
 PQserverVersion to figure out what the server will return, no?

I need *both*.


 Also, if you really do need to figure out which PG headers you're
 compiling against, looking at catversion.h is the accepted way to do it.
 There's no need for yet another symbol.

This file is, AFAIK, not included with client installs? It's
definitely not present in the libpq-dev package on debian. It's a
backend development file, no?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's not a bug, that's the way it currently works. We don't need a test
 case for that.

 Oh, you're right.  I missed the fact that it's a left join.

The only thing that struck me as curious about it was that the OP didn't
get a nestloop-with-inner-indexscan plan.  That would be explainable if
there was no index on the large table's id column ... but columns
named like that usually have indexes.

I can't get all *that* excited about complicating hash joins as
proposed.  The query is still fundamentally going to be slow because
you won't get out of having to seqscan the large table.  The only way
to make it really fast is to not read all of the large table, and
nestloop-with-inner-indexscan is the only plan type with a hope of
doing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

Aidan Van Dyk ai...@highrise.ca writes:
 On Wed, Dec 29, 2010 at 9:11 AM, Gurjeet Singh singh.gurj...@gmail.com 
 wrote:
 AFAIK, that applies to parallel dumps of data (may help in --schema-only
 dumps too), and what you are trying is for schema.

 Right, but one of the things it does is break the dump in to parts,
 and put them in a directory/file organization.

 Both are doing it for different reasons, but doing pretty much the
 same thing.  But can the layout/organization of Joachim's patch can be
 made human friendly in the vein of Joel's vision?

I think they're fundamentally different things, because the previously
proposed patch is an extension of the machine-readable archive format,
and has to remain so because of the expectation that people will want
to use parallel restore with it.  Joel is arguing for a split-up of
the text dump format.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

Magnus Hagander mag...@hagander.net writes:
 On Tue, Dec 28, 2010 at 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, if you really do need to figure out which PG headers you're
 compiling against, looking at catversion.h is the accepted way to do it.
 There's no need for yet another symbol.

 This file is, AFAIK, not included with client installs? It's
 definitely not present in the libpq-dev package on debian. It's a
 backend development file, no?

[ shrug... ]  We can't be held responsible for stupid packaging
decisions by distros.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing pg_upgrade's check of available binaries

Tom Lane wrote:
 I've been fooling around with creating upgrade-in-place support for the
 Fedora/RHEL RPMs.  What I want to have is a separate postgresql-upgrade
 RPM containing just the minimum possible set of previous-release files,
 together with pg_upgrade itself.  Experimenting with this convinced me
 that pg_upgrade is a few bricks shy of a load in its tests for whether
 the old and new clusters have the right binaries available:
 
 * it insists on pg_dumpall and psql being present in the old cluster,
 though they are not in fact called
 * it fails to check for pg_resetxlog, even though it needs it in both
 old and new clusters
 * it fails to check for pg_config, which it does need in the new
 cluster.  It does not however really need it in the old cluster,
 because it has no use for the old cluster's --pkglibdir path.
 
 I propose the attached patch to clean these things up.  Any objections?

Looks good to me.  I was not super-strict about checking binaries
because I assumed a full install on both clusters, but it doesn't hurt
to make that more specific.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

On Wed, Dec 29, 2010 at 16:12, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Dec 28, 2010 at 16:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Also, if you really do need to figure out which PG headers you're
 compiling against, looking at catversion.h is the accepted way to do it.
 There's no need for yet another symbol.

 This file is, AFAIK, not included with client installs? It's
 definitely not present in the libpq-dev package on debian. It's a
 backend development file, no?

 [ shrug... ]  We can't be held responsible for stupid packaging
 decisions by distros.

Running make install in src/interfaces/libpq does not install
catversion.h. If it's required to know which version of the libpq
headers are in use, it should be, shouldn't it?

We can be held responsible for the packaging decisions if they use
*our* make install commands, imho.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI SLRU strategy choices

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:

I'm not sure how you arrived at that number, though.

http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709

The way I read this, each segment is (BLCKSZ *
SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB. The
number of files is limited to 64k because of the to
segment file naming. So total space is limited to 16GB. When an
SLRU is used to store xids for random access, that's 4 bytes per
entry, so 2^32 entries are possible, but SLRU code considers it a
problem for the space to become more than half full. With the eight
byte entries I need, there are 2^31 slots for entries, with the
ability to use 2^30 before it becomes half full and SLRU complains.

Does that look right to you, or have I misunderstood something?

The only issue I can see with that is that you allocate those 8
bytes for every xid, even if it's a non-serializable transaction
or a subtransaction. But the overhead is probably not significant
in practice.

Right. And it avoids having to sequentially search for the desired
xid. A sequential search seems to me like it would get into O(N^2)
performance under extreme load, whereas this approach has a couple
performance plateaus at O(1) which will be, I think, the normal
case, and only goes to O(N) performance under extreme load.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Joel Jacobson

2010/12/29 Tom Lane t...@sss.pgh.pa.us

 I think they're fundamentally different things, because the previously
 proposed patch is an extension of the machine-readable archive format,
 and has to remain so because of the expectation that people will want
 to use parallel restore with it.  Joel is arguing for a split-up of
 the text dump format.


Yes, exactly.

My patch is of course also a lot smaller :-)
pg_dump-directory.diff.: 112 853 bytes
pg-dump-split-plain-text-files-9.1devel.patch..:   5 579 bytes

I just tried the pg_dump-directory.diff patch.
The only thing is has in common with my patch is it writes data to different
files, and it's only the data which is splitted into different files, the
schema appears to go into the single file TOC.

Example, pg_dump-directory.diff:

$ ./pg_dump -f /crypt/dirpatch -F d -s glue
$ ls -la /crypt/dirpatch/
TOC
(1 file)

$ rm -rf /crypt/dirpatch

$ ./pg_dump -f /crypt/dirpatch -F d glue

$ ls /crypt/dirpatch/
6503.dat
6504.dat
...lots of files...
6871.dat
6872.dat
6873.dat
6874.dat
TOC

Example, pg_dump --split patch:

$ pg_dump -f /crypt/splitpatch -F p --split -s glue

$ ls /crypt/splitpatch*
/crypt/splitpatch (file)
/crypt/splitpatch-split: (directory)
myschema1
myschema2
public
$ ls /crypt/splitpatch-split/public/
AGGREGATE
CONSTRAINT
FK_CONSTRAINT
FUNCTION
INDEX
SEQUENCE
TABLE
TRIGGER
TYPE
VIEW

$ ls /crypt/splitpatch-split/public/FUNCTION/
myfunc.sql
otherfunc.sql

$ cat /crypt/splitpatch
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
...etc...
\i /crypt/splitpatch-split/public/FUNCTION/myfunc.sql
\i /crypt/splitpatch-split/public/FUNCTION/otherfunc.sql


-- 
Best regards,

Joel Jacobson
Glue Finance

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Dec 29, 2010, at 7:56 AM, Noah Misch n...@leadboat.com wrote:
 ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In 
 some
 cases, we can determine that doing so is unhelpful, and that the conversion
 shall always succeed:
 
 CREATE DOMAIN loosedom AS text;
 CREATE TABLE t (c varchar(2));
 ALTER TABLE t ALTER c TYPE varchar(4);
 ALTER TABLE t ALTER c TYPE text;
 ALTER TABLE t ALTER c TYPE loosedom;
 
 In other cases, we can determine that the rewrite is unhelpful, but a cast 
 could
 still throw an error:
 
 CREATE DOMAIN tightdom AS text CHECK (value LIKE '%/');
 CREATE TABLE t (c text);
 ALTER TABLE t ALTER c TYPE xml USING c::xml;
 ALTER TABLE t ALTER c TYPE varchar(64);
 ALTER TABLE t ALTER c TYPE tightdom;
 
 I wish to replace table rewrites with table verification scans where possible,
 then skip those verification scans where possible.

Seems like a good idea.

 Having thought on it
 more, though, it actually seems best to attempt the verification scan *every*
 time.  In most ineligible conversions, an inequality will appear very early, 
 so
 the scan is effectively O(1) in the negative case.  A notable exception is
 something like char(6)-varchar(6) in a table with a billion tuples having
 length(col) = 6 and one with length(col) = 5.  The verification scan might 
 read
 most of the table before finding the one tuple that forces a rewrite. That
 isn't a particularly regular scenario in my experience, so the just do the
 right thing aspect of preceding every potential rewrite with a verification
 scan seems to win out.

I think this scenario will be more common than you might think.  Tables don't 
contain random data; they contain data that the DBA thinks is valid.  The 
situation where the data is mostly as you expect but with a few kooky rows is, 
in my experience, extremely common.  And it makes the worst case a LOT worse.  
I really doubt this is worth the complexity anyway - converting between two 
types that are mostly-but-not-quite binary compatible seems like an edge case 
in every sense.

 Certain very popular type changes (see introduction) can be _exempt_ from the
 verification scan: we can determine that they will always succeed.  To capture
 that, I propose extending CREATE CAST with the notion of an exemptor function:
 
 CREATE CAST (source_type AS target_type)
{ WITH FUNCTION function_name (argument_type [, ...])
  [ WITH EXEMPTOR function_name ] |
  WITHOUT FUNCTION |
  WITH INOUT }
[ AS ASSIGNMENT | AS IMPLICIT ]
 
 The exemptor shall have this signature:
 
 exemptor_func(
integer, -- source_typmod
integer  -- dest_typmod
 ) RETURNS boolean
 
 The exemptor shall return true iff datumIsEqual(x, 
 x::target_type(dest_typmod))
 for every x in source_type most recently coerced to 
 source_type(source_typmod).
 When the cast is WITHOUT FUNCTION (a binary coercion) and target_type lacks a
 length coercion cast, the cast has an implicit exemption, and an exemptor is
 superfluous: code can assume an exemptor that always returns true.  Use of 
 WITH
 EXEMPTOR mainly makes sense alongside multi-arg WITH FUNCTION (should probably
 raise an error upon other uses).

I am not sure whether it's worth trying to be general here. Maybe we should 
just hard-code the known cases involving core datatypes.

 

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

Noah Misch n...@leadboat.com writes:
 ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In 
 some
 cases, we can determine that doing so is unhelpful, and that the conversion
 shall always succeed:
 I wish to replace table rewrites with table verification scans where possible,
 then skip those verification scans where possible.

This has been discussed before; have you read the previous threads?

I really really dislike the notion of a verification scan: it's
basically work that is going to be useless if it fails.  I think your
argument that it will usually fail quickly is quite unconvincing, and in
any case the situations where it is useful at all are too thin on the
ground to be worth the code space to implement it.  It seems sufficient
to me to skip the rewrite in cases of provable binary compatibility, with
possibly an extra check for safe changes of typmod.  With respect to
the latter, I agree a type-specific function to compare the typmods
would be the way to go, although exemptor seems a pretty badly chosen
name for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Li Jie

- Original Message - 
From: Alvaro Herrera alvhe...@commandprompt.com
To: Robert Haas robertmh...@gmail.com
Cc: Jie Li jay23j...@gmail.com; pgsql-hackers 
pgsql-hackers@postgresql.org
Sent: Wednesday, December 29, 2010 8:39 PM
Subject: Re: [HACKERS] small table left outer join big table


 Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
 On Tue, Dec 28, 2010 at 5:13 AM, Jie Li jay23j...@gmail.com wrote:
  Hi,
 
  Please see the following plan:
 
  postgres=# explain select * from small_table left outer join big_table 
  using
  (id);
  QUERY PLAN
  
  Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
  Hash Cond: (small_table.id = big_table.id)
  - Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
  - Hash (cost=59142.00..59142.00 rows=410 width=8)
  - Seq Scan on big_table (cost=0.00..59142.00 rows=410
  width=8)
  (5 rows)
 
  Here I have a puzzle, why not choose the small table to build hash table? 
  It
  can avoid multiple batches thus save significant I/O cost, isn't it?
 
 Yeah, you'd think.  Can you post a full reproducible test case?
 
 Also, what version is this?
 
 -- 
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

The version is 9.0.1.  I believe the latest version works in the same way.

Thanks,
Li Jie

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Li Jie

Thank you for all your comments.

I think the condition of this optimization is whether the small table can fit 
into memory. If not, then it doesn't work since two tables still need to be 
written to disk. But if yes, we can save all I/O costs in the hash join 
process. 

Thanks,
Li Jie

- Original Message - 
From: Robert Haas robertmh...@gmail.com
To: Simon Riggs si...@2ndquadrant.com
Cc: Jie Li jay23j...@gmail.com; pgsql-hackers 
pgsql-hackers@postgresql.org
Sent: Wednesday, December 29, 2010 8:59 PM
Subject: Re: [HACKERS] small table left outer join big table


On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
 
  Here I have a puzzle, why not choose the small table to build hash table? 
  It
  can avoid multiple batches thus save significant I/O cost, isn't it?

 Yeah, you'd think. Can you post a full reproducible test case?

 It's not a bug, that's the way it currently works. We don't need a test
 case for that.

 I agree that the optimisation would be a useful one.

 It allows you to ask the query Show me sales for each of my stores
 efficiently, rather than being forced to request the inner join query
 Show me the sales for each of my stores for which there have been
 sales, which is a much less useful query.

Oh, you're right.  I missed the fact that it's a left join.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Li Jie

- Original Message - 
From: Tom Lane t...@sss.pgh.pa.us
To: Robert Haas robertmh...@gmail.com
Cc: Simon Riggs si...@2ndquadrant.com; Jie Li jay23j...@gmail.com; 
pgsql-hackers pgsql-hackers@postgresql.org
Sent: Wednesday, December 29, 2010 10:59 PM
Subject: Re: [HACKERS] small table left outer join big table 

 Robert Haas robertmh...@gmail.com writes:
 On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's not a bug, that's the way it currently works. We don't need a test
 case for that.

 Oh, you're right.  I missed the fact that it's a left join.

 The only thing that struck me as curious about it was that the OP didn't
 get a nestloop-with-inner-indexscan plan.  That would be explainable if
 there was no index on the large table's id column ... but columns
 named like that usually have indexes.

 I can't get all *that* excited about complicating hash joins as
 proposed.  The query is still fundamentally going to be slow because
 you won't get out of having to seqscan the large table.  The only way
 to make it really fast is to not read all of the large table, and
 nestloop-with-inner-indexscan is the only plan type with a hope of
 doing that.

 regards, tom lane

Yes there is no index on the joined column, otherwise 
nestloop-with-inner-indexscan should be preferred.

But why can't outer join be as clever as inner join? Anyway, if we 
unfortunately don't have available index, we have no choice but rely on hash 
join, right?

Thanks,
Li Jie
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

Robert Haas robertmh...@gmail.com writes:
 On Dec 29, 2010, at 7:56 AM, Noah Misch n...@leadboat.com wrote:
 The exemptor shall have this signature:
 
 exemptor_func(
 integer, -- source_typmod
 integer  -- dest_typmod
 ) RETURNS boolean
 
 The exemptor shall return true iff datumIsEqual(x, 
 x::target_type(dest_typmod))
 for every x in source_type most recently coerced to 
 source_type(source_typmod).

 I am not sure whether it's worth trying to be general here. Maybe we should 
 just hard-code the known cases involving core datatypes.

I find the idea of hard coding to be pretty icky.  However, the elephant
in the room here is the possibility of typmod getting replaced by some
other representation.  It would make life simpler if we didn't invent
this additional type-specific API until that dust has settled.  So maybe
the plan should be hard-coding in the short term and add an API later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either


On 29.12.2010 13:17, Robert Haas wrote:

Did you read the whole thread?


Ah, sorry:


I've had to change some of the heap_open(rv) calls to
relation_open(rv) to avoid having the former throw the wrong error
message before the latter kicks in.  I think there might be stylistic
objections to that, but I'm not sure what else to propose.  I'm
actually pretty suspicious that many of the heap_open(rv) calls I
*didn't* change are either already a little iffy or likely to become
so once the SQL/MED stuff for foreign tables goes in.  They make it
easy to forget that we've got a whole pile of relkinds and you
actually need to really think about which ones you can handle.


Hmm, I believe the idea of heap_open is to check that the relation is 
backed by a heap that you can read with heap_beginscan+heap_next. At the 
moment that includes normal tables, sequences and toast tables. Foreign 
tables would not fall into that category.


Yeah, you're right that most of the callers of heap_open actually want 
to a tighter check than that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, I believe the idea of heap_open is to check that the relation is 
 backed by a heap that you can read with heap_beginscan+heap_next. At the 
 moment that includes normal tables, sequences and toast tables. Foreign 
 tables would not fall into that category.

I don't believe that that definition is documented anyplace; if we
decide that's what we want it to mean, some code comments would be in
order.

 Yeah, you're right that most of the callers of heap_open actually want 
 to a tighter check than that.

I think probably most of the physical calls of heap_open are actually
associated with system catalog accesses, and the fact that the code says
heap_open not relation_open has got more to do with copypaste than any
real thought about what we're specifying.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

On Wed, Dec 29, 2010 at 11:47:53AM +0100, Magnus Hagander wrote:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

+1 for bin/

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Wed, Dec 29, 2010 at 11:16:23AM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In 
  some
  cases, we can determine that doing so is unhelpful, and that the conversion
  shall always succeed:
  I wish to replace table rewrites with table verification scans where 
  possible,
  then skip those verification scans where possible.
 
 This has been discussed before; have you read the previous threads?

It would help a lot if, in future, you include references to the
previous threads you have in mind rather than simply mention that they
exist.

Saying, in effect, search the archives with our not-super-great
search technology using keywords you didn't think of, comes off as
pretty dismissive if not downright hostile.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI memory mitigation false positive degradation


On 26.12.2010 21:40, Kevin Grittner wrote:

To recap, I've had an open question on the Serializable Wiki page[1]
since January about how we should handle long-running transactions.
The algorithm published by Cahill et al requires keeping some
transaction information in memory for all committed transactions
which overlapped a still-running transaction.  Since we need to keep
this in shared memory, and the structures must have a finite
allocation, there's an obvious looming limit, even if the allocation
is relatively generous.


Looking at the predicate lock splitting, it occurs to me that it's 
possible for a non-serializable transaction to be canceled if it needs 
to split a predicate lock held by a concurrent serializable transaction, 
and you run out of space in the shared memory predicate lock area. Any 
chance of upgrading the lock to a relation lock, or killing the 
serializable transaction instead?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

David Fetter da...@fetter.org writes:
 On Wed, Dec 29, 2010 at 11:47:53AM +0100, Magnus Hagander wrote:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

 +1 for bin/

Is it really stable enough for bin/?  My impression of the state of
affairs is that there is nothing whatsoever about replication that
is really stable yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI memory mitigation false positive degradation

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 
 Looking at the predicate lock splitting, it occurs to me that
 it's possible for a non-serializable transaction to be canceled if
 it needs to split a predicate lock held by a concurrent
 serializable transaction, and you run out of space in the shared
 memory predicate lock area.
 
Good point.  We don't want that, for sure.
 
 Any chance of upgrading the lock to a relation lock, or killing
 the serializable transaction instead?
 
Absolutely.  Good suggestion.  Thanks!
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

On Dec 29, 2010, at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, I believe the idea of heap_open is to check that the relation is 
 backed by a heap that you can read with heap_beginscan+heap_next. At the 
 moment that includes normal tables, sequences and toast tables. Foreign 
 tables would not fall into that category.
 
 I don't believe that that definition is documented anyplace; if we
 decide that's what we want it to mean, some code comments would be in
 order.

The existing comments mention that callers must check that the return value is 
not a view, if they care.  So if there is currently a single coherent 
definition for what heap_open is supposed to do, it's clearly NOT the one 
Heikki proposes.  My guess is that reality is closer to your theory of what 
got cut-and-pasted.

...Robert

Re: [HACKERS] pg_streamrecv for 9.1?

On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Is it really stable enough for bin/?  My impression of the state of
 affairs is that there is nothing whatsoever about replication that
 is really stable yet.

Well, that's not stopping us from shipping a core feature called replication. 
 I'll defer to others on how mature pg_streamrecv is, but if it's no worse than 
replication in general I think putting it in bin/ is the right thing to do.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO item for pg_ctl and server detection

Bruce Momjian wrote:
 Yes, that was my calculus too.  I realized that we create session ids by
 merging the process id and backend start time, so I went ahead and added
 the postmaster start time epoch to the postmaster.pid file.  While there
 is no way to pass back the postmaster start time from PQping, I added
 code to pg_ctl to make sure the time in the postmaster.pid file is not
 _before_ pg_ctl started running.  We only check PQping() after we have
 started the postmaster ourselves, so it fits our needs.

Tom suggested that there might be clock skew between pg_ctl and the
postmaster, so I added a 2-second slop in checking the postmaster start
time.  Tom also wanted the connection information to be output all at
once, but that causes a problem with detecting pre-9.1 servers so I
avoided it.

Updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index cda7f64..86bc5a6 100644
*** /tmp/pgdiff.14558/cXBdee_storage.sgml	Wed Dec 29 13:42:38 2010
--- doc/src/sgml/storage.sgml	Wed Dec 29 12:11:03 2010
*** last started with/entry
*** 117,124 
  row
   entryfilenamepostmaster.pid//entry
   entryA lock file recording the current postmaster process id (PID),
!  cluster data directory, port number, Unix domain socket directory,
!  and shared memory segment ID/entry
  /row
  
  /tbody
--- 117,125 
  row
   entryfilenamepostmaster.pid//entry
   entryA lock file recording the current postmaster process id (PID),
!  postmaster start time, cluster data directory, port number, user-specified
!  Unix domain socket directory, first valid listen_address host, and
!  shared memory segment ID/entry
  /row
  
  /tbody
diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c
index a003dc9..461a7a6 100644
*** /tmp/pgdiff.14558/o3NlPc_ipc_test.c	Wed Dec 29 13:42:38 2010
--- src/backend/port/ipc_test.c	Wed Dec 29 12:11:03 2010
*** on_exit_reset(void)
*** 104,110 
  }
  
  void
! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2)
  {
  }
  
--- 104,110 
  }
  
  void
! AddToLockFile(int target_line, const char *str)
  {
  }
  
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index d970eb2..ff77099 100644
*** /tmp/pgdiff.14558/mac69b_sysv_shmem.c	Wed Dec 29 13:42:38 2010
--- src/backend/port/sysv_shmem.c	Wed Dec 29 12:11:03 2010
*** InternalIpcMemoryCreate(IpcMemoryKey mem
*** 198,206 
  	/* Register on-exit routine to detach new segment before deleting */
  	on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));
  
! 	/* Record key and ID in lockfile for data directory. */
! 	RecordSharedMemoryInLockFile((unsigned long) memKey,
!  (unsigned long) shmid);
  
  	return memAddress;
  }
--- 198,214 
  	/* Register on-exit routine to detach new segment before deleting */
  	on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));
  
! 	/*
! 	 * Append record key and ID in lockfile for data directory. Format
! 	 * to try to keep it the same length.
! 	 */
! 	{
! 		char line[32];
! 
! 		sprintf(line, %9lu %9lu\n, (unsigned long) memKey,
! 	 (unsigned long) shmid);
! 		AddToLockFile(LOCK_FILE_LINES, line);
! 	}
  
  	return memAddress;
  }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a46a323..c1e553a 100644
*** /tmp/pgdiff.14558/GKd28a_postmaster.c	Wed Dec 29 13:42:38 2010
--- src/backend/postmaster/postmaster.c	Wed Dec 29 12:11:03 2010
*** PostmasterMain(int argc, char *argv[])
*** 483,489 
  	int			status;
  	char	   *userDoption = NULL;
  	int			i;
! 
  	MyProcPid = PostmasterPid = getpid();
  
  	MyStartTime = time(NULL);
--- 483,490 
  	int			status;
  	char	   *userDoption = NULL;
  	int			i;
! 	bool		connection_line_output = false;
! 	
  	MyProcPid = PostmasterPid = getpid();
  
  	MyStartTime = time(NULL);
*** PostmasterMain(int argc, char *argv[])
*** 860,869 
--- 861,882 
  		  UnixSocketDir,
  		  ListenSocket, MAXLISTEN);
  			else
+ 			{
  status = StreamServerPort(AF_UNSPEC, curhost,
  		  (unsigned short) PostPortNumber,
  		  UnixSocketDir,
  		  ListenSocket, MAXLISTEN);
+ /* must supply a valid listen_address for PQping() */
+ if (!connection_line_output)
+ {
+ 	char line[MAXPGPATH + 2];
+ 
+ 	sprintf(line, %s\n, curhost);
+ 	AddToLockFile(LOCK_FILE_LINES - 1, line);
+ 	connection_line_output = true;
+ }
+ 			}
+ 
  			if (status == STATUS_OK)
  success++;
  			else
*** PostmasterMain(int argc, char *argv[])
*** 880,885 
--- 893,902 
  		pfree(rawstring);
  	}
  
+ 	/* Supply an empty listen_address line for PQping() */
+ 	if (!connection_line_output)
+

Re: [HACKERS] SSI memory mitigation false positive degradation

Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Any chance of upgrading the lock to a relation lock, or killing
 the serializable transaction instead?
  
 Absolutely.  Good suggestion.  Thanks!
 
I pushed a TODO SSI comment at the appropriate point with my ideas
on how best to fix this.  I want to stick with the SLRU changes for
now, rather than risk flushing brain cache on the topic just now. 
If Dan (or anyone else, for that matter) wants to fix this, feel
free; just post first, as will I if nobody beats me to it.
 
There are actually two spots in PredicateLockPageSplit and one in
PredicateLockPageCombine where this needs to be addressed.  I can't
think of any other functions where we're vulnerable to having an
impact on non-serializable transactions.  We sure want to plug those
-- I see it as critical to acceptance that we can honor the promise
of no impact on any transactions at REPEATABLE READ or less strict
isolation levels.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Dec 29, 2010, at 11:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I really really dislike the notion of a verification scan: it's
 basically work that is going to be useless if it fails.

I think it has potential in cases like text to xml.  In that case it'll either 
work or fail, with no possibility of requiring a do-over.  Scanning the whole 
table is a whole lot cheaper than rewriting it.  But I agree with your 
assessment of the optimistic verification scan case.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

On Dec 29, 2010, at 10:14 AM, Magnus Hagander mag...@hagander.net wrote:
 We can be held responsible for the packaging decisions if they use
 *our* make install commands, imho.

Yep.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

Robert Haas robertmh...@gmail.com writes:
 On Dec 29, 2010, at 11:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I really really dislike the notion of a verification scan: it's
 basically work that is going to be useless if it fails.

 I think it has potential in cases like text to xml.  In that case it'll 
 either work or fail, with no possibility of requiring a do-over.  Scanning 
 the whole table is a whole lot cheaper than rewriting it.

I don't believe avoiding the write part (but not the read part, nor the
XML syntax verification part) is a sufficiently compelling argument to
justify having that code path.  There are not enough distinct datatypes
sharing binary representations to make this a worthwhile thing to worry
over.

Basically, I believe that the only use-case that will have more than
epsilon number of users is I want to make this varchar(5) into
varchar(10), or possibly text.  We can fix that case without adding a
boatload more code that we'll have to maintain.

I do have some interest in the idea of having a type-specific function
that can recognize no-op typmod changes, but I would envision that as
being an expression evaluation optimization: let the planner throw away
the call to the length-checking function when it isn't going to do
anything.  It's not by any means only useful in ALTER COLUMN TYPE ---
and in fact probably doesn't even need any bespoke code there, if we put
it into expression_planner() instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

Excerpts from Magnus Hagander's message of mié dic 29 11:40:34 -0300 2010:
 On Wed, Dec 29, 2010 at 15:05, Gurjeet Singh singh.gurj...@gmail.com wrote:

  Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix?
 
 Um, I just copied it off a similar entry elsewhere. I saw no comment
 about what _P actually means, and I can't say I know. I know very
 little about the bison files :-)

Some lexer keywords have a _P prefix because otherwise they'd collide
with some symbol in Windows header files or something like that.  It's
old stuff, but I think you, Magnus, were around at that time.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-29 Thread Gurjeet Singh

On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas robertmh...@gmail.com wrote:

 On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Is it really stable enough for bin/?  My impression of the state of
  affairs is that there is nothing whatsoever about replication that
  is really stable yet.

 Well, that's not stopping us from shipping a core feature called
 replication.  I'll defer to others on how mature pg_streamrecv is, but if
 it's no worse than replication in general I think putting it in bin/ is the
 right thing to do.


As the README says that is not self-contained (for no fault of its own) and
one should typically set archive_command to guarantee zero WAL loss.

quote
TODO: Document some ways of setting up an archive_command that works well
together with pg_streamrecv.
/quote

I think implementing just that TODO might make it a candidate.

I have neither used it nor read the code, but if it works as advertised
then it is definitely a +1 from me; no preference of bin/ or contrib/, since
the community will have to maintain it anyway.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: [HACKERS] writable CTEs

On Tue, Dec 28, 2010 at 11:07:59PM +, Peter Geoghegan wrote:
 On 28 December 2010 20:07, Peter Eisentraut pete...@gmx.net wrote:
  The phrase common table expression does not appear anywhere in the SQL
  standard.  The standard uses the grammar symbol with clause.
 
 I think we're losing sight of the issue a bit here.
 
 No one is proposing that we call WITH queries common table
 expressions. As I think we all agree, the term WITH query and
 common table expression are not synonymous. A WITH query is
 comprised of one or more common table expressions, plus a conventional
 SELECT query.

As of 9.1, the thing appended to the CTE(s) can be a conventional DML
query (SELECT, INSERT, UPDATE or DELETE).  I'm hoping to expand this
in future versions. :)

 All that I'm asking is that we /specify/ that the subqueries
 already mentioned in the docs are common table expressions.

+1

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

Robert Haas robertmh...@gmail.com writes:
 The existing comments mention that callers must check that the return
 value is not a view, if they care.  So if there is currently a single
 coherent definition for what heap_open is supposed to do, it's clearly
 NOT the one Heikki proposes.  My guess is that reality is closer to
 your theory of what got cut-and-pasted.

Well, reality is that in the beginning there was heap_open and
index_open, and nothing else.  And there weren't views, so basically
those two functions covered all the interesting types of relations.
We got to the current state of affairs by a series of whatever were the
least invasive code changes at the time; nobody's ever taken a step
back and tried to define what heap_open ought to allow from the
standpoint of first principles.

In practice I think it would make sense if heap_open accepts all
relation types on which you can potentially do either a heapscan or
indexscan (offhand those should be the same set of relkinds, I think;
so this is the same in effect as Heikki's proposal, but phrased
differently).  So it would have to start rejecting views, and we'd need
to go looking for the consequences of that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

Excerpts from Tom Lane's message of mié dic 29 16:29:45 -0300 2010:

 In practice I think it would make sense if heap_open accepts all
 relation types on which you can potentially do either a heapscan or
 indexscan (offhand those should be the same set of relkinds, I think;
 so this is the same in effect as Heikki's proposal, but phrased
 differently).  So it would have to start rejecting views, and we'd need
 to go looking for the consequences of that.

This seems a very good idea, but I think we shouldn't let it sink the
current patch.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

Oleg Bartunov wrote:
 Hi there,
 
 it's clear we need versions, probably, major.minor would be enough. The 
 problem
 I see is how to keep .so in sync with .sql ? Should we store .sql in database 
 ?
 
 Also, we need permissions for extension, since we have open/closed 
 extensions.
 

Don't people normally define the version number in the Makefile and pass
the version string into the C code and perhaps a psql variable?

---


 
 Oleg
 
 On Sat, 11 Dec 2010, David E. Wheeler wrote:
 
  On Dec 11, 2010, at 1:09 PM, David Fetter wrote:
 
  Why is it in the makefile at all?  If the makefile does need to know it,
  why don't we have it scrape the number out of the control file?  Or even
  more to the point, since when do we need version numbers in extensions?
 
  We *absolutely* need version numbers in extensions.  People will want
  to have a certain version, or a certain minimum version, etc., etc.,
  etc., just as they do for any other software.
 
  Seriously, are you OK?
 
  One of the biggest mistakes in the creation of CPAN was allowing modules 
  without extensions. It makes figuring out what to upgrade extremely 
  difficult. Learning from that, PGXN requires version numbers for all 
  extensions.
 
  Best,
 
  David
 
 
 
 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI SLRU strategy choices

Excerpts from Kevin Grittner's message of mié dic 29 12:20:20 -0300 2010:

 http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709
  
 The way I read this, each segment is (BLCKSZ *
 SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB.  The
 number of files is limited to 64k because of the  to 
 segment file naming.  So total space is limited to 16GB.  When an
 SLRU is used to store xids for random access, that's 4 bytes per
 entry, so 2^32 entries are possible, but SLRU code considers it a
 problem for the space to become more than half full.  With the eight
 byte entries I need, there are 2^31 slots for entries, with the
 ability to use 2^30 before it becomes half full and SLRU complains.

If these limitations become a problem, you can always change them.
A couple of zeroes at the start of the pg_clog filenames aren't going to
bother anyone, I don't think.  Not so sure about your new proposed
design's space usage.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

2010-12-29 Thread David E. Wheeler

On Dec 29, 2010, at 12:00 PM, Bruce Momjian wrote:

 Don't people normally define the version number in the Makefile and pass
 the version string into the C code and perhaps a psql variable?

There is no standard pattern AFAIK. A best practice would be welcome here.

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of miÃ© dic 29 16:29:45 -0300 2010:
 In practice I think it would make sense if heap_open accepts all
 relation types on which you can potentially do either a heapscan or
 indexscan (offhand those should be the same set of relkinds, I think;
 so this is the same in effect as Heikki's proposal, but phrased
 differently).  So it would have to start rejecting views, and we'd need
 to go looking for the consequences of that.

 This seems a very good idea, but I think we shouldn't let it sink the
 current patch.

No, but possibly regularizing what heap_open is defined to do would make
Robert's patch simpler.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone for SSDs?

Tom Lane wrote:
 Jeff Janes jeff.ja...@gmail.com writes:
  Of course if you do a full table scan because their are no better
  options, then it scans sequentially.  But you have to scan the pages
  in *some* order, and it is hard to see how something other than
  sequential would be systematically better.
 
 In fact, if sequential *isn't* the best order for reading the whole
 file, the filesystem has lost its marbles completely; because that is
 the order in which most files are read, so files ought to be laid out
 on disk (or whatever storage device) to be read most quickly that way.

Plus kernel read-ahead helps with sequential access too because the
kernel can guess the next blocks to be requested --- hard to do that
with random I/O.  SSD have fast access but still benefit from
read-ahead.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone for SSDs?

Vaibhav Kaushal wrote:
 On Fri, 2010-12-10 at 18:07 -0800, Josh Berkus wrote: 
  On 12/10/10 5:06 PM, Daniel Loureiro wrote:
   An quicksort method in
   sequential disk its just awful to be thinking in a non SSD world, but
   its possible in an SSD.
  
  So, code it.  Shouldn't be hard to write a demo comparison.  I don't
  believe that SSDs make quicksort-on-disk feasible, but would be happy to
  be proven wrong.
 
 I too do not believe it in normal case. However, considering the 'types'
 of SSDs, it may be feasible! Asking for 'the next page and getting it'
 has a time delay in the process. While on a regular HDD with spindles,
 the question is where is that page located, with SSDs, the question
 disappears, because the access time is uniform in case of SSDs. Also,
 the access time is about 100 times fasterm which would change quite a
 few things about the whole process.

What _is_ interesting is that Postgres often has sequential and
random/disk ways of doing things, and by reducing random_page_cost when
using SSDs, you automatically use more random operations, so in a way
the Postgres code was already prepared for SSD usage.  Surprisingly, we
had to change very little.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI SLRU strategy choices

Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 If these limitations become a problem, you can always change them.
 A couple of zeroes at the start of the pg_clog filenames aren't
 going to bother anyone, I don't think.  Not so sure about your new
 proposed design's space usage.
 
I guess that's a call the community can make now -- if a
serializable transaction which is not flagged as read only remains
open long enough for over a billion other transactions to commit, is
it OK for the old transaction to be automatically canceled?  Is it
worth messing with the SLRU limits to double that?
 
Beyond a certain point you have transaction ID wrap-around, so at
that point this would be the least of your troubles -- canceling
the old transaction might even be helpful.  I thought that was at 2
billion, but Heikki was saying it's at 1 billion in an earlier post.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

Bruce Momjian br...@momjian.us writes:
 Oleg Bartunov wrote:
 it's clear we need versions, probably, major.minor would be enough. The 
 problem
 I see is how to keep .so in sync with .sql ? Should we store .sql in 
 database ?

 Don't people normally define the version number in the Makefile and pass
 the version string into the C code and perhaps a psql variable?

We had a long discussion upthread of what version numbers to keep where.
IMHO the Makefile is about the *least* useful place to put a version
number; the more so if you want more than one.  What we seem to need is
a version number in the .sql file itself (so that we can tell whether we
need to take action to update the extension's catalog entries).  I'm not
convinced yet whether there needs to be another version number embedded
in the .so file --- it may well be that the PG major version number
embedded with PG_MODULE_MAGIC is sufficient.

Personally I'd forget the notion of major.minor numbers here; all that
will accomplish is to complicate storage and comparison of the numbers.
We just need a simple integer that gets bumped whenever the extension's
SQL script changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.1alpha3 release notes help

Josh Berkus j...@agliodbs.com writes:
 On 12/27/10 7:35 PM, Josh Berkus wrote:
 On 12/27/10 1:45 PM, Peter Eisentraut wrote:
 I'm unable to produce any really exciting release notes for alpha3.  I
 have produced a draft here:
 http://wiki.postgresql.org/wiki/Alpha_release_notes_draft  Please edit
 the bullet points if you have some idea.
 
 I'll see what I can do.

 Expanded with my edits on the wiki.

Well pg_execute_from_file is the name of a patch that went in, but after
review, the function of the same name has been taken out.  As a result
we do not have pg_execute_from_file() nor pg_execute_sql_file() in
alpha3 and we won't have it later with extensions.

There's the configuration file parsing merge that we might want to tell
users about.  You no longer have to follow different quoting rules in
the recovery.conf and postgresql.conf files.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

Joel Jacobson j...@gluefinance.com writes:
 Solution: I propose a new option to pg_dump, --split, which dumps each
 object to a separate file in a user friendly directory structure:

Please have a look at getddl:

  https://github.com/dimitri/getddl

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone for SSDs?

Bruce Momjian wrote:
 Vaibhav Kaushal wrote:
  On Fri, 2010-12-10 at 18:07 -0800, Josh Berkus wrote: 
   On 12/10/10 5:06 PM, Daniel Loureiro wrote:
An quicksort method in
sequential disk its just awful to be thinking in a non SSD world, but
its possible in an SSD.
   
   So, code it.  Shouldn't be hard to write a demo comparison.  I don't
   believe that SSDs make quicksort-on-disk feasible, but would be happy to
   be proven wrong.
  
  I too do not believe it in normal case. However, considering the 'types'
  of SSDs, it may be feasible! Asking for 'the next page and getting it'
  has a time delay in the process. While on a regular HDD with spindles,
  the question is where is that page located, with SSDs, the question
  disappears, because the access time is uniform in case of SSDs. Also,
  the access time is about 100 times fasterm which would change quite a
  few things about the whole process.
 
 What _is_ interesting is that Postgres often has sequential and
 random/disk ways of doing things, and by reducing random_page_cost when
 using SSDs, you automatically use more random operations, so in a way
 the Postgres code was already prepared for SSD usage.  Surprisingly, we
 had to change very little.

To add to this very late reply, we basically had random methods to do
things (in RAM), and sequential/random methods for disk.  By changing
random_page_cost, we favor doing random things on disk.

The big question is whether there are random things we have never
implemented on disk that now make sense --- off hand, I can't think of
any.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

2010-12-29 Thread David E. Wheeler

On Dec 29, 2010, at 12:23 PM, Tom Lane wrote:

 We had a long discussion upthread of what version numbers to keep where.
 IMHO the Makefile is about the *least* useful place to put a version
 number; the more so if you want more than one.  What we seem to need is
 a version number in the .sql file itself (so that we can tell whether we
 need to take action to update the extension's catalog entries).  I'm not
 convinced yet whether there needs to be another version number embedded
 in the .so file --- it may well be that the PG major version number
 embedded with PG_MODULE_MAGIC is sufficient.

For contrib maybe, but not 3rd-party extensions.

 Personally I'd forget the notion of major.minor numbers here; all that
 will accomplish is to complicate storage and comparison of the numbers.
 We just need a simple integer that gets bumped whenever the extension's
 SQL script changes.

That won't be very flexible for third-party extensions. FWIW, for PGXN I 
mandated symantic version numbers (http://semver.org/), mainly because they're 
quite close to Pg core version numbers. I also created a basic data type for 
them:

  https://github.com/theory/pgxn-manager/blob/master/sql/02-types.sql#L70

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI SLRU strategy choices

Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 if a serializable transaction which is not flagged as read only
 remains open long enough for over a billion other transactions to
 commit
 
Maybe a clarification and example would be useful.  We're talking
about going through a billion transactions which were assigned a
TransactionId, not all database transactions.  An example of how you
could hit that is with a sustained commit rate of 5000 transactions
per second which are modifying data while a single read write
transaction stays open for 2.3 days.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Simon Riggs

On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
  It's not a bug, that's the way it currently works. We don't need a test
  case for that.
 
  Oh, you're right.  I missed the fact that it's a left join.
 
 The only thing that struck me as curious about it was that the OP didn't
 get a nestloop-with-inner-indexscan plan.  That would be explainable if
 there was no index on the large table's id column ... but columns
 named like that usually have indexes.
 
 I can't get all *that* excited about complicating hash joins as
 proposed.  The query is still fundamentally going to be slow because
 you won't get out of having to seqscan the large table.  The only way
 to make it really fast is to not read all of the large table, and
 nestloop-with-inner-indexscan is the only plan type with a hope of
 doing that.

Seq scanning the big table isn't bad... we've gone to a lot of trouble
to make it easy to do this, especially with many users.

Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.

Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

On Wed, Dec 29, 2010 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mié dic 29 16:29:45 -0300 2010:
 In practice I think it would make sense if heap_open accepts all
 relation types on which you can potentially do either a heapscan or
 indexscan (offhand those should be the same set of relkinds, I think;
 so this is the same in effect as Heikki's proposal, but phrased
 differently).  So it would have to start rejecting views, and we'd need
 to go looking for the consequences of that.

 This seems a very good idea, but I think we shouldn't let it sink the
 current patch.

 No, but possibly regularizing what heap_open is defined to do would make
 Robert's patch simpler.

I think that any meaning we assign to heap_open is going to be 95%
arbitrary and of little practical help.  There are at least six
different sets of object classes which to which a particular commands
or alter table subcommands can be legally applied: (1) tables only,
(2) views only, (3) tables and views, (4) tables and indexes, (5)
tables and composite types, (6) tables, views, and composite types.
Adding foreign tables promises to add several more combinations
immediately and likely more down the line; if we add materialized
views, that'll add a bunch more.  There's not really any single
definition that's going to be a silver bullet.  I think the best thing
to do might be to get RID of heap_open(rv) and always use
relation_openrv plus an appropriate relkind test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_primary_conninfo

Magnus Hagander mag...@hagander.net writes:
 On Tue, Dec 28, 2010 at 18:12, Robert Haas robertmh...@gmail.com wrote:
 Although maybe now that we've made recovery.conf use the GUC lexer we
oughta continue in that vein and expose those parameters as
PGC_INTERNAL GUCs rather than inventing a new function for it...

 That's definitely another option that I wouldn't object to if people
 prefer that way.

+1.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

On Wed, Dec 29, 2010 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Oleg Bartunov wrote:
 it's clear we need versions, probably, major.minor would be enough. The 
 problem
 I see is how to keep .so in sync with .sql ? Should we store .sql in 
 database ?

 Don't people normally define the version number in the Makefile and pass
 the version string into the C code and perhaps a psql variable?

 We had a long discussion upthread of what version numbers to keep where.
 IMHO the Makefile is about the *least* useful place to put a version
 number; the more so if you want more than one.  What we seem to need is
 a version number in the .sql file itself (so that we can tell whether we
 need to take action to update the extension's catalog entries).  I'm not
 convinced yet whether there needs to be another version number embedded
 in the .so file --- it may well be that the PG major version number
 embedded with PG_MODULE_MAGIC is sufficient.

 Personally I'd forget the notion of major.minor numbers here; all that
 will accomplish is to complicate storage and comparison of the numbers.
 We just need a simple integer that gets bumped whenever the extension's
 SQL script changes.

I think there are really two tasks here:

1. Identify whether a newer set of SQL definitions than the one
installed is available.  If so, the extension is a candidate for an
upgrade.

2. Identify whether the installed version of the SQL definitions is
compatible with the installed shared object.  If it's not, we'd like
the shared library load (or at a minimum, any use of the shared
library) to fail when attempted, rather than attempting to plunge
blindly onward and then crashing.

As to point #2, what an extension author would typically do (I hope)
is publish a new shared object is make it backward-compatible with
some number of previous versions of the SQL definitions, but not
necessarily all the way to the beginning of time.  So the typical
upgrade sequence would be to install the new .so, and then upgrade the
SQL definitions.  You'd want an interlock, though, in case someone
tried to use a set of SQL definitions that were either too new or too
old for the corresponding shared library.  The too new case could
occur if someone installed a new version of the shared library,
upgraded the SQL definitions, and then put the old shared library file
back.  The too old case could occur if the extension were upgraded
through many releases in a single step, such that whatever
backward-compatibility support existed in the shared library didn't
reach back far enough to cater to the ancient SQL definitions.  In
either case, you want to chunk an error when someone tries to use the
module, rather than soldiering on blindly and crashing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] writable CTEs

2010-12-29 Thread Martijn van Oosterhout

On Tue, Dec 28, 2010 at 07:09:14AM -0500, Robert Haas wrote:
 On Tue, Dec 28, 2010 at 12:45 AM, David Fetter da...@fetter.org wrote:
  I don't see how people can be relying on links to 9.1-to-be's
  documentation.
 
 Well, it's always handy when the filenames are the same across
 versions.  Ever looked at the 9.0 documentation for something and then
 modified the URL to see what it looked like in 8.1 or something?

I have occasionally wondered if it would be possible to include in the
SGML references to the same (or equivalent) page in previous versions,
so that each page in the docs could reference the equivalent page in
other versions. That would be extremely useful when trying to work out
what works in your particular version.

I've never done anything serious with SGML, but it'd be cool if it was
possible.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] pg_streamrecv for 9.1?

Magnus Hagander mag...@hagander.net writes:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

+1 for having that in core, only available for the roles WITH
REPLICATION I suppose?

 I think that the base backup feature is more important than simple streaming
 chunks of the WAL (SR already does this). Talking about the base backup over
 libpq, it is something we should implement to fulfill people's desire that
 claim an easy replication setup.

 Yes, definitely. But that also needs server side support.

Yeah, but it's already in core for 9.1, we have pg_read_binary_file()
there. We could propose a contrib module for previous version
implementing the function in C, that should be pretty easy to code.

  The only reason I didn't do that for pg_basebackup is that I wanted a
  self-contained python script, so that offering a public git repo is
  all I needed as far as distributing the tool goes.

 Yeah, the WIP patch heikki posted is simliar, except it uses tar
 format and is implemented natively in the backend with no need for
 pl/pythonu to be installed.

As of HEAD the dependency on pl/whatever is easily removed.

The included C tool would need to have a parallel option from the get-go
if at all possible, but if you look at the pg_basebackup prototype, it
would be good to drop the wrong pg_xlog support in there and rely on a
proper archiving setup on the master.

Do you want to work on internal archive and restore commands over libpq
in the same effort too?  I think this tool should be either a one time
client or a daemon with support for:

 - running a base backup when receiving a signal
 - continuous WAL streaming from a master
 - accepting standby connections and streaming to them
 - one-time libpq streaming of a WAL file, either way

Maybe we don't need to daemonize the tool from the get-go, but if you're
going parallel for the base-backup case you're almost there, aren't you?
Also having internal commands for archive and restore commands that rely
on this daemon running would be great too.

I'd offer more help if it wasn't for finishing the extension patches,
I'm currently working on 'alter extension ... upgrade', including how to
upgrade from pre-9.1 extensions.  But if that flies quicker than I want,
count me in for more than only user specs :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, patch v16

2010-12-29 Thread David E. Wheeler

On Dec 29, 2010, at 1:27 PM, Robert Haas wrote:

 I think there are really two tasks here:
 
 1. Identify whether a newer set of SQL definitions than the one
 installed is available.  If so, the extension is a candidate for an
 upgrade.
 
 2. Identify whether the installed version of the SQL definitions is
 compatible with the installed shared object.  If it's not, we'd like
 the shared library load (or at a minimum, any use of the shared
 library) to fail when attempted, rather than attempting to plunge
 blindly onward and then crashing.

3. Check dependencies for one extension on other extensions.

 As to point #2, what an extension author would typically do (I hope)
 is publish a new shared object is make it backward-compatible with
 some number of previous versions of the SQL definitions, but not
 necessarily all the way to the beginning of time.  So the typical
 upgrade sequence would be to install the new .so, and then upgrade the
 SQL definitions.  You'd want an interlock, though, in case someone
 tried to use a set of SQL definitions that were either too new or too
 old for the corresponding shared library.  The too new case could
 occur if someone installed a new version of the shared library,
 upgraded the SQL definitions, and then put the old shared library file
 back.  The too old case could occur if the extension were upgraded
 through many releases in a single step, such that whatever
 backward-compatibility support existed in the shared library didn't
 reach back far enough to cater to the ancient SQL definitions.  In
 either case, you want to chunk an error when someone tries to use the
 module, rather than soldiering on blindly and crashing.

Yeah, makes sense.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 29.12.2010 06:54, Robert Haas wrote:

  With the patch:

 rhaas=# cluster v;
 ERROR:  views do not support CLUSTER

 do not support sounds like a missing feature, rather than a nonsensical
 command. How about something like CLUSTER cannot be used on views

In the latest version of this patch, I created four translatable
strings per object type:

blats do not support %s (where %s is an SQL command)
blats do not support constraints
blats do not support rules
blats do not support triggers

It's reasonable enough to write CLUSTER cannot be used on views, but
does constraints cannot be used on views seems more awkward to me.
Or do we think that's OK?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Upgrading Extension, version numbers (was: [HACKERS] Extensions, patch v16)

Robert Haas robertmh...@gmail.com writes:
On Wed, Dec 29, 2010 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
We had a long discussion upthread of what version numbers to keep where.
IMHO the Makefile is about the *least* useful place to put a version
number; the more so if you want more than one. What we seem to need is
a version number in the .sql file itself (so that we can tell whether we
need to take action to update the extension's catalog entries). I'm not
convinced yet whether there needs to be another version number embedded
in the .so file --- it may well be that the PG major version number
embedded with PG_MODULE_MAGIC is sufficient.

In the .sql file? You mean something like:

ALTER EXTENSION ... SET VERSION '...';

It's currently managed in the .control file of the extension, which
allows us to list available extensions and their version number without
having to parse the .sql script from the C code...

Personally I'd forget the notion of major.minor numbers here; all that
will accomplish is to complicate storage and comparison of the numbers.
We just need a simple integer that gets bumped whenever the extension's
SQL script changes.

For contrib, as you wish. Now for third-party extensions, I don't see
us having any authority on what people will use internally in their
companies, etc.

1. Identify whether a newer set of SQL definitions than the one
installed is available. If so, the extension is a candidate for an
upgrade.

Well, it's currently (WIP in the upgrade branch of my repo) easier than
that, really. You have the control file on the file system and you have
the extension's entry in the catalogs.

http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/upgrade

What upgrade means here is running a given SQL script, that you choose
depending on the current and next version strings, following a scheme
that has been extensively discussed in another thread, and is currently
implemented like this:

# lo
comment = 'managing Large Objects'
version = '9.1devel'
relocatable = true
upgrade_from_null = 'null = lo.upgrade.sql'

Here, any property that begins with 'upgrade_from_' is considered as an
upgrade setup and the part after the prefix is not considered. The
value is meant to have two parts separated by '=', first is either null
or a regexp matched against currently installed version number, second
part is the upgrade script name to use at ALTER EXTENSION ... UPGRADE.

We support 'null' version number to be able to upgrade from existing
code which is not organized as an extension yet. The aim is to be able
to:

CREATE EMPTY EXTENSION lo; -- version is null here
ALTER EXTENSION lo UPGRADE;

And run a script containing lines that will look like this:

alter domain @extsch...@.lo set extension lo;
alter function @extsch...@.lo_oid(lo) set extension lo;
alter function @extsch...@.lo_manage() set extension lo;

Note that we always need to support the placeholder here, because of
course following dependencies at this point isn't possible.

2. Identify whether the installed version of the SQL definitions is
compatible with the installed shared object. If it's not, we'd like
the shared library load (or at a minimum, any use of the shared
library) to fail when attempted, rather than attempting to plunge
blindly onward and then crashing.

Well, the way I see things, it's already too late and there's nothing we
can easily do to prevent that. What I mean is that the user will
typically upgrade the OS-level package first, then apply the upgrade on
the database(s).

$ apt-get install postgresql-9.1-prefix
$ psql -U postgres -c 'alter extension prefix upgrade' somedb

At the time you tell PostgreSQL about the new extension, the shared
object file has been in place for some time already, and the upgrade SQL
script has not been ran yet.

What I hope extension authors will do is document whether any upgrade
requires a restart or will otherwise be responsible for instability in
the server for backend started with the newer .so before the upgrade
script has been run. So that users/DBA will know whether the upgrade
calls for a maintenance window.

I could see us trying to shoehorn such information into the control file
too, but would ERRORing out on LOAD be any better than taking the
compatibility chance? Knowing that the compatibility in most cases
depends a lot on the actual call paths?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

On Wed, Dec 29, 2010 at 04:53:47PM -0500, Robert Haas wrote:
 On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  On 29.12.2010 06:54, Robert Haas wrote:
 
   With the patch:
 
  rhaas=# cluster v;
  ERROR:  views do not support CLUSTER
 
  do not support sounds like a missing feature, rather than a nonsensical
  command. How about something like CLUSTER cannot be used on views
 
 In the latest version of this patch, I created four translatable
 strings per object type:
 
 blats do not support %s (where %s is an SQL command)
 blats do not support constraints
 blats do not support rules
 blats do not support triggers
 
 It's reasonable enough to write CLUSTER cannot be used on views, but
 does constraints cannot be used on views seems more awkward to me.
 Or do we think that's OK?

That particular one looks good insofar as it describes reality.  With
predicate locks, etc., it may become untrue later, though :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] understanding minimum recovery ending location

2010-12-29 Thread Robert Treat

Howdy,

I am hoping someone can help me better understand what the minimum recovery
ending location of pg_controldata represents with regards to 9.0 hot
standbys. When I look at any of our 8.4 (or lower) installs this number is
almost always somewhere in the past of the xlog timeline (presuming the
server has been up for any length of time), which makes sense because we've
done enough replay that we're covered for consistent slave recovery
purposes. However, on the hot standby machines the number seems to normally
be at some point in the future. Below is the relevant bits of pg_controldata
output from a hot standby I was looking at earlier today:

Database cluster state:   in archive recovery
pg_control last modified: Wed 29 Dec 2010 04:54:34 PM GMT
Latest checkpoint location:   56/21020CA8
Prior checkpoint location:56/1E36D780
Latest checkpoint's REDO location:56/1F599068
Time of latest checkpoint:Wed 29 Dec 2010 04:46:09 PM GMT
Minimum recovery ending location: 56/24B88500
Backup start location:0/0
Current wal_level setting:hot_standby

As you can see, the minimum recovery ending location is clearly ahead of the
most recent checkpoint activity. Now, it's not always like this, but most of
the time it is, and since minimum recovery ending location seems to be
regularly be updating going forward, it tends to make me think that either I
misunderstand what this means, or it means something different in this
context. Partially because I can query on the hot standby machine already,
so I know I have a recoverable slave, but even more so in the context of
pg_controldata on the master:

Database cluster state:   in production
pg_control last modified: Wed 29 Dec 2010 04:54:04 PM GMT
Latest checkpoint location:   56/234325B0
Prior checkpoint location:56/21020CA8
Latest checkpoint's REDO location:56/220558A8
Time of latest checkpoint:Wed 29 Dec 2010 04:51:09 PM GMT
Minimum recovery ending location: 0/0
Backup start location:0/0
Current wal_level setting:hot_standby

As you can see, the masters checkpoint information is actually ahead of the
slaves (as expected), but even in this scenario, the slave is saying that
the minimum recovery ending location is actually in the future compared to
the latest checkpoint and redo locations of the master. This seems like a
bug to me (how can it possibly be required that the minimum recovery ending
location would be at a point in the xlog timeline that may never exist?) ,
but I am guessing this field  is actually reporting something different in
this context, so I am hoping someone can help clarify it for me?


Robert Treat
http://www.xzilla.net

Re: [HACKERS] understanding minimum recovery ending location


On 30.12.2010 00:19, Robert Treat wrote:

Howdy,

I am hoping someone can help me better understand what the minimum recovery
ending location of pg_controldata represents with regards to 9.0 hot
standbys. When I look at any of our 8.4 (or lower) installs this number is
almost always somewhere in the past of the xlog timeline (presuming the
server has been up for any length of time), which makes sense because we've
done enough replay that we're covered for consistent slave recovery
purposes. However, on the hot standby machines the number seems to normally
be at some point in the future. Below is the relevant bits of pg_controldata
output from a hot standby I was looking at earlier today:

Database cluster state:   in archive recovery
pg_control last modified: Wed 29 Dec 2010 04:54:34 PM GMT
Latest checkpoint location:   56/21020CA8
Prior checkpoint location:56/1E36D780
Latest checkpoint's REDO location:56/1F599068
Time of latest checkpoint:Wed 29 Dec 2010 04:46:09 PM GMT
Minimum recovery ending location: 56/24B88500
Backup start location:0/0
Current wal_level setting:hot_standby

As you can see, the minimum recovery ending location is clearly ahead of the
most recent checkpoint activity.


Minimum recovery ending location means: how far does the (standby) 
server need to replay WAL before it's safe to start it up. It is 
continuously updated as the recovery progresses, as data pages are 
flushed to disk. The reason is that if you kill the server during 
recovery, you have to recover up to the same point again, or the 
database wouldn't be consistent. Specifically, the WAL records for any 
data page changes that were already flushed to disk from the buffer 
cache before killing recovery must be re-replayed.


So in practice the minimum recovery ending location follows somewhat 
behind the last replayed WAL record.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Wed, Dec 29, 2010 at 10:56:39AM -0500, Robert Haas wrote:
 On Dec 29, 2010, at 7:56 AM, Noah Misch n...@leadboat.com wrote:
  Having thought on it
  more, though, it actually seems best to attempt the verification scan 
  *every*
  time.  In most ineligible conversions, an inequality will appear very 
  early, so
  the scan is effectively O(1) in the negative case.  A notable exception is
  something like char(6)-varchar(6) in a table with a billion tuples having
  length(col) = 6 and one with length(col) = 5.  The verification scan might 
  read
  most of the table before finding the one tuple that forces a rewrite. That
  isn't a particularly regular scenario in my experience, so the just do the
  right thing aspect of preceding every potential rewrite with a verification
  scan seems to win out.
 
 I think this scenario will be more common than you might think.  Tables don't 
 contain random data; they contain data that the DBA thinks is valid.  The 
 situation where the data is mostly as you expect but with a few kooky rows 
 is, in my experience, extremely common.

Perhaps.  A few kooky rows is indeed common, but we're talking about a specific
breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
transformation expression, and 0.1% have different bits.  Is that common?

In case it was not obvious, I'll note that any error thrown by a transformation
expression during the verification scan still aborts the ALTER TABLE.  A
varchar(20)-varchar(10) that finds an 11-char string will fail permanently
during the verification scan.  Indeed, the primary value of the verification
scan is to distinguish positive and error, not positive and negative.

Expanding on my introduction, none of the following can yield a negative
verification scan; the result is always positive or an error:

CREATE DOMAIN loosedom AS text;
CREATE DOMAIN tightdom AS text CHECK (value LIKE '%/');
CREATE TABLE t (c varchar(6));
INSERT INTO t VALUES ('abc/'),('de/');
ALTER TABLE t ALTER c TYPE varchar(8);
ALTER TABLE t ALTER c TYPE text;
ALTER TABLE t ALTER c TYPE loosedom;
ALTER TABLE t ALTER c TYPE xml USING c::xml;
ALTER TABLE t ALTER c TYPE varchar(64);
ALTER TABLE t ALTER c TYPE tightdom;

Adding a bpchar into the mix makes a negative verification scan possible, as
does a USING clause having a truncating effect.  Continuing the example, these
can and would get a negative verification scan:
ALTER TABLE t ALTER c TYPE character(6);
ALTER TABLE t ALTER c TYPE varchar(5) USING c::varchar(5);
Plenty of academic USING clause examples exist:
ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN 'de/' THEN 'foo' ELSE 
c END;

Verification scans for conversions between fundamentally different types will
generally end in the negative at the first tuple.  Consider {timestamp,bigint,
real,numeric,varbit}-text, int-bigint, interval-reltime, etc.  I can't think
of a decent non-academic example where heterogeneous conversions like these will
dupe the verification scan for even a handful of tuples.  Granted, one can flip
that around as an argument for declaring the conversions that have a chance.

 And it makes the worst case a LOT worse.

I suppose the absolute worst case would involve an ALTER TABLE adding brutally
expensive CHECK constraints, such that the cost of computing those constraints
would dominate the cost of both the verification scan and the rewrite, yielding
a 100% slower ALTER TABLE run.  A more realistic bad case might be a table much
larger than memory with no indexes, and the verification scan adds a full seq
scan for nothing.  A crude test here has rewriting such a table taking 7x as
long as a seq scan on it.  By the back of that envelope, we'll take about 15%
more time.  The right mix of expensive expressions will raise that percentage,
and the presence of indexes will drop it.  Remember though, we're still only
talking about the relatively-rare cases that even can get a negative
verification scan.

 I really doubt this is worth the complexity anyway -  

We'd have the verification scan regardless of how we choose when to use it,
because how else would we implement no-rewrite varchar(8)-varchar(4) or
text-xml?  An unconditional verification scan is merely the most trivial
algorithm for deciding when to employ it.  Those conversions are semantically
similar to adding CHECK constraints, and in that sense we already have an
initial verification scan implementation: ATRewriteTable(..., InvalidOid, ...).

 converting between two types that are mostly-but-not-quite binary compatible 
 seems like an edge case in every sense.

Yes.  Indeed, that's the intuitive basis for my hypothesis that the verification
scan will usually either fail early.  I don't advocate this approach to pick up
edge cases, but to pick up reasonable cases _without explicit annotations_
showing them to be achievable.  Take the text-xml example, certainly of genuine
value if not top-frequency.  I see three ways to ensure we do a

[HACKERS] SLRU API tweak

Attached is a small patch to avoid putting an opaque structure into
the slru.h file and using it in an external function call where
external callers must always specify NULL.
 
-Kevin
 

*** a/src/backend/access/transam/clog.c
--- b/src/backend/access/transam/clog.c
***
*** 445,451  BootStrapCLOG(void)
slotno = ZeroCLOGPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(ClogCtl, slotno, NULL);
Assert(!ClogCtl-shared-page_dirty[slotno]);
  
LWLockRelease(CLogControlLock);
--- 445,451 
slotno = ZeroCLOGPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(ClogCtl, slotno);
Assert(!ClogCtl-shared-page_dirty[slotno]);
  
LWLockRelease(CLogControlLock);
***
*** 698,704  clog_redo(XLogRecPtr lsn, XLogRecord *record)
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
  
slotno = ZeroCLOGPage(pageno, false);
!   SimpleLruWritePage(ClogCtl, slotno, NULL);
Assert(!ClogCtl-shared-page_dirty[slotno]);
  
LWLockRelease(CLogControlLock);
--- 698,704 
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
  
slotno = ZeroCLOGPage(pageno, false);
!   SimpleLruWritePage(ClogCtl, slotno);
Assert(!ClogCtl-shared-page_dirty[slotno]);
  
LWLockRelease(CLogControlLock);
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
***
*** 1454,1460  BootStrapMultiXact(void)
slotno = ZeroMultiXactOffsetPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(MultiXactOffsetCtl, slotno, NULL);
Assert(!MultiXactOffsetCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactOffsetControlLock);
--- 1454,1460 
slotno = ZeroMultiXactOffsetPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactOffsetControlLock);
***
*** 1465,1471  BootStrapMultiXact(void)
slotno = ZeroMultiXactMemberPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(MultiXactMemberCtl, slotno, NULL);
Assert(!MultiXactMemberCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactMemberControlLock);
--- 1465,1471 
slotno = ZeroMultiXactMemberPage(0, false);
  
/* Make sure it's written out */
!   SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactMemberControlLock);
***
*** 1986,1992  multixact_redo(XLogRecPtr lsn, XLogRecord *record)
LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
  
slotno = ZeroMultiXactOffsetPage(pageno, false);
!   SimpleLruWritePage(MultiXactOffsetCtl, slotno, NULL);
Assert(!MultiXactOffsetCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactOffsetControlLock);
--- 1986,1992 
LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
  
slotno = ZeroMultiXactOffsetPage(pageno, false);
!   SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactOffsetControlLock);
***
*** 2001,2007  multixact_redo(XLogRecPtr lsn, XLogRecord *record)
LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
  
slotno = ZeroMultiXactMemberPage(pageno, false);
!   SimpleLruWritePage(MultiXactMemberCtl, slotno, NULL);
Assert(!MultiXactMemberCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactMemberControlLock);
--- 2001,2007 
LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
  
slotno = ZeroMultiXactMemberPage(pageno, false);
!   SimpleLruWritePage(MultiXactMemberCtl, slotno);
Assert(!MultiXactMemberCtl-shared-page_dirty[slotno]);
  
LWLockRelease(MultiXactMemberControlLock);
*** a/src/backend/access/transam/slru.c
--- b/src/backend/access/transam/slru.c
***
*** 78,83  typedef struct SlruFlushData
--- 78,85 
int segno[MAX_FLUSH_BUFFERS];   /* 
their log seg#s */
  } SlruFlushData;
  
+ typedef struct SlruFlushData *SlruFlush;
+ 
  /*
   * Macro to mark a buffer slot most recently used.  Note multiple evaluation
   * of arguments!
***
*** 123,128  static int slru_errno;
--- 125,131 
  
  static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
  static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
+ static

Re: [HACKERS] estimating # of distinct values

2010-12-29 Thread Josh Berkus


 Well, but that's not 7%, thats 7x! And the theorem says 'greater or equal'
 so this is actually the minimum - you can get a much bigger difference
 with lower probability. So you can easily get an estimate that is a few
 orders off.

FWIW, based on query performance, estimates which are up to 5X off are
tolerable, and anything within 3X is considered accurate.  Above 5X
the probability of bad query plans becomes problematically high.

Of course, if you're doing cross-column stats, the accuracy of each
individual column becomes critical since estimation error could be
combiniational in the worst case (i.e. if colA is 3X and colB is 0.3X
then colA-colB will be 9X off).

Anyway, I look forward to your experiments with stream-based estimators.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Anyone for SSDs?

2010-12-29 Thread Robert Treat

On Wed, Dec 29, 2010 at 3:34 PM, Bruce Momjian br...@momjian.us wrote:

 Bruce Momjian wrote:
  Vaibhav Kaushal wrote:
   On Fri, 2010-12-10 at 18:07 -0800, Josh Berkus wrote:
On 12/10/10 5:06 PM, Daniel Loureiro wrote:
 An quicksort method in
 sequential disk its just awful to be thinking in a non SSD world,
 but
 its possible in an SSD.
   
So, code it.  Shouldn't be hard to write a demo comparison.  I don't
believe that SSDs make quicksort-on-disk feasible, but would be happy
 to
be proven wrong.
  
   I too do not believe it in normal case. However, considering the
 'types'
   of SSDs, it may be feasible! Asking for 'the next page and getting it'
   has a time delay in the process. While on a regular HDD with spindles,
   the question is where is that page located, with SSDs, the question
   disappears, because the access time is uniform in case of SSDs. Also,
   the access time is about 100 times fasterm which would change quite a
   few things about the whole process.
 
  What _is_ interesting is that Postgres often has sequential and
  random/disk ways of doing things, and by reducing random_page_cost when
  using SSDs, you automatically use more random operations, so in a way
  the Postgres code was already prepared for SSD usage.  Surprisingly, we
  had to change very little.

 To add to this very late reply, we basically had random methods to do
 things (in RAM), and sequential/random methods for disk.  By changing
 random_page_cost, we favor doing random things on disk.

 The big question is whether there are random things we have never
 implemented on disk that now make sense --- off hand, I can't think of
 any.


The idea of us avoiding quicksort when we know we need to spill to disk is
the type of thing that I wonder if it should be investigated, if you figure
that spill to disk means ssd's so it's not so much of a performance
hit. This reminds me of some performance testing we did maybe a year, year
and a half ago, trying to see how best to get performance by adding some
SSD's into one of our servers. Basically speed increased as we changed
things like so:
put entire $pgdata on sata
put entire $pgdata on ssd
put xlogs on ssd, pgdata on sata
put pgdata and xlogs on sata, put arc on ssd, crank up postgres's memory
settings

arc being zfs's adaptive replacement cache, so basically giving the server a
second, very large level of memory to work with, and then configuring
postgres to make use of it. It wasn't terribly obvious to me why this ended
up outperforming the initial idea of putting everything on ssd, but my
impression was that the more you could force postgres into making decisions
as if it was dealing with fast storage rather than slow storage, the better
off you'd be (and that random_page_cost is not so wholly inclusive enough to
do this for you).


Robert Treat
http://www.xzilla.net

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Wed, Dec 29, 2010 at 11:16:23AM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  ALTER TABLE ALTER TYPE always rewrites the table heap and its indexes.  In 
  some
  cases, we can determine that doing so is unhelpful, and that the conversion
  shall always succeed:
  I wish to replace table rewrites with table verification scans where 
  possible,
  then skip those verification scans where possible.
 
 This has been discussed before; have you read the previous threads?

I cited two threads I had read on the subject.  Were there other important ones?

 I really really dislike the notion of a verification scan: it's
 basically work that is going to be useless if it fails.  I think your
 argument that it will usually fail quickly is quite unconvincing, and in
 any case the situations where it is useful at all are too thin on the
 ground to be worth the code space to implement it.  It seems sufficient
 to me to skip the rewrite in cases of provable binary compatibility, with
 possibly an extra check for safe changes of typmod.  With respect to
 the latter, I agree a type-specific function to compare the typmods
 would be the way to go, although exemptor seems a pretty badly chosen
 name for it.

I have attempted to expand on these problems in my reply to Robert.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql contrib module

2010-12-29 Thread KaiGai Kohei


(2010/12/27 17:53), Simon Riggs wrote:

On Fri, 2010-12-24 at 11:53 +0900, KaiGai Kohei wrote:


The attached patch is the modular version of SE-PostgreSQL.


Looks interesting.

Couple of thoughts...

Docs don't mention row-level security. If we don't have it, I think we
should say that clearly.


Indeed, it is a good idea the document mentions what features are not
implemented in this version clearly, not only row-level security, but
DDL permissions and so on. I'd like to revise it soon.


I think we need a Guide to Security Labels section in the docs. Very
soon, because its hard to know what is being delivered and what is not.


Does it describe what is security label and the purpose of them?
OK, I'd like to add this section here.


Is the pg_seclabel table secure? Looks like the labels will be available
to read.


If we want to control visibility of each labels, we need row-level
granularity here.


How do we tell if sepgsql is installed?


Check existence of GUC variables of sepgsql.*.


What happens if someone alters the configuration so that the sepgsql
plugin is no longer installed. Does the hidden data become visible?


Yes. If sepgsql plugin is uninstalled, the hidden data become visible.
But no matter. Since only a person who is allowed to edit postgresql.conf
can uninstall it, we cannot uninstall it in run-time.
(An exception is loading a malicious module, but we will be able to
hook this operation in the future version.)

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

On Wed, Dec 29, 2010 at 02:01:28PM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Dec 29, 2010, at 11:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I really really dislike the notion of a verification scan: it's
  basically work that is going to be useless if it fails.
 
  I think it has potential in cases like text to xml.  In that case it'll 
  either work or fail, with no possibility of requiring a do-over.  Scanning 
  the whole table is a whole lot cheaper than rewriting it.
 
 I don't believe avoiding the write part (but not the read part, nor the
 XML syntax verification part) is a sufficiently compelling argument to
 justify having that code path.  There are not enough distinct datatypes
 sharing binary representations to make this a worthwhile thing to worry
 over.
 
 Basically, I believe that the only use-case that will have more than
 epsilon number of users is I want to make this varchar(5) into
 varchar(10), or possibly text.  We can fix that case without adding a
 boatload more code that we'll have to maintain.

Those are certainly the big ones, but I've also hit these in the field:
SET timezone = 'UTC';
CREATE DOMAIN state AS text CHECK (VALUE ~ '[A-Z]{2}');
CREATE TABLE t (c0 text, c1 text, c2 timestamp);
ALTER TABLE t ALTER c0 TYPE state,
  ALTER c1 TYPE varchar(6),
  ALTER c2 TYPE timestamptz;

It felt normal, but I very well may have been in that epsilon unawares.  I would
be disappointed to end the project with no way to avoid rewrites for them.  That
being said, there are certainly more ways to achieve all of those than the one I
have proposed.

 I do have some interest in the idea of having a type-specific function
 that can recognize no-op typmod changes, but I would envision that as
 being an expression evaluation optimization: let the planner throw away
 the call to the length-checking function when it isn't going to do
 anything.  It's not by any means only useful in ALTER COLUMN TYPE ---
 and in fact probably doesn't even need any bespoke code there, if we put
 it into expression_planner() instead.

That sounds significantly better.  Ignorant question: how often will
expression_planner have the old typmod available?  Also, would this be worth
attempting unconditionally, or might some callers use the expression too few
times and suffer from the overhead of deciding whether to use it?

Perhaps the order of patches I proposed was faulty and should have placed the
most important use cases first, like this:

1. Modify ATPrepAlterColumnType to detect a bare RelabelType transformation
expression and conditionally skip/downgrade ATRewriteTable.  This would cover
varchar-text and similar domain changes, but a full reindex remains.
2. In the no-rewrite case, modify the vicinity of finish_heap_swap to narrow us
from reindexing the entire table to merely reindexing those indexes touched by
the operator class changes.
3. Further skip reindex operations when the operator class has changed but the
operator family has not.
4. Add pg_cast.castexemptor and modify CREATE CAST to populate it.  Define an
exemptor (or however we name it) for the varchar length coercion cast.  Modify
expression_planner to use it to strip out superfluous length coercion casts.
This would cover varchar(10)-varchar(20) and similar domain changes.
5. Define exemptors for time, timetz, timestamp, timestamptz, and interval.
6. Define exemptor for bit varying.
7. Define exemptor for numeric.
8. (subject to further discussion) Add the verification scan to the vicinity of
ATRewriteTable and skip the table rewrite when it finds no changes.  This covers
varchar(20)-varchar(10) and similar domain changes, timestamp-timestamptz, and
text-xml.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-29 Thread Mark Kirkwood

We have been seeing these warnings recently whenever a standby is 
brought up (typically to check it is ok). Sometimes they are coupled 
with corrupted indexes which require a REINDEX to fix. Initially I 
thought these uninitialized pages were due to primary crashes or 
hardware issues, however I've now managed to come up with a recipe to 
generate them on demand on my workstation.


Pitrtools appears to be an essential part of the recipe - at this stage 
I'm not sure if it is actually doing something directly to cause this or 
merely tickling some Postgres recovery bug.


The essential triggering element seems to be performing a base backup 
while the system is busy. Here's the description:


1/ Patch 8.3's pgbench using the attached diff, and initialize scale 100 
dataset

2/ Get Pitrtools primary and standby config's setup (examples attached)
3/ Start pgbench with at least 4 clients and 20 transactions
4/ After history has approx 1 rows initiate backup from the standby
5/ After history has approx 14 rows bring up the standby and perform 
a VACUUM


Typically I'm seeing a large number of consecutive uninitialized pages 
in the accounts table. What is also very interesting is that if I setup 
the standby in a more bare bones manner (i.e manually running 
pg_start_backup and rsync + pg_standby) then I can *never* elicit any 
uninitialized pages.


I'm frankly puzzled about what Pitrtools is doing that is different - I 
only noticed it using rsync compression (-z) and doing rsync backups via 
pulling from the standby rather than pushing from the primary (I'm in 
the process of trying these variations out in the bare bones case). Just 
as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if there 
is/are timing issues which mean that recovery might use some (corrupted) 
logs from there before the (clean) archived ones arrive (will check).


Some more detail about the system:

Postgres 8.3.12 on Ubuntu Lucid x86_64 and Debian Lenny (lxc guests), 
rsync 3, Pitrtools 1.2-1


Postgres config changes:

autovacuum = off  # prevent any relation truncation
max_fsm_pages = 2 # encourage new page creation

Pitrtools Steps:

primary:
$ grep archive_command postgresql.conf
archive_command = 'cmd_archiver -C /etc/pitrtools/cmd_archiver.ini -F %p'

standby:
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -B
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -Astop_basebackup
$ cp /etc/postgresql/8.3/main/pg_hba.conf \
 /var/lib/postgresql/8.3/main/pg_hba.conf
$ cp /etc/postgresql/8.3/main/postgresql.conf \
 /var/lib/postgresql/8.3/main/postgresql.conf
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -S
$ cmd_standby -C /etc/pitrtools/cmd_standby.ini -F999

Bare Bones Steps:

primary:
$ grep archive_command postgresql.conf
 archive_command = 'rsync %p standby:/var/lib/postgresql/archive'

$ psql -c SELECT pg_start_backup('backup');
$ rsync --exclude pg_xlog/\* --exclude postmaster.pid -a * \
standby:/var/lib/postgresql/8.3/main
$ psql -c SELECT pg_stop_backup();

standby:
$ grep restore_command recovery.conf
restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t 
/tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r'

$ /etc/init.d/postgresql-8.3 start
$ touch /tmp/trigger.5432



regards

Mark

P.s: cc'ing Pg Hackers as variation of this topic has come up there 
several times.


pgbench.diff.gz
Description: GNU Zip compressed data


cmd_archiver.ini.gz
Description: GNU Zip compressed data


cmd_standby.ini.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql contrib module

2010-12-29 Thread Simon Riggs

On Thu, 2010-12-30 at 09:26 +0900, KaiGai Kohei wrote:

  What happens if someone alters the configuration so that the sepgsql
  plugin is no longer installed. Does the hidden data become visible?
 
 Yes. If sepgsql plugin is uninstalled, the hidden data become visible.
 But no matter. Since only a person who is allowed to edit postgresql.conf
 can uninstall it, we cannot uninstall it in run-time.
 (An exception is loading a malicious module, but we will be able to
 hook this operation in the future version.)

IMHO all security labels should be invisible if the provider is not
installed correctly.

That at least prevents us from accidentally de-installing a module and
having top secret data be widely available.

If you have multiple providers configured, you need to be careful not to
allow a provider that incorrectly implements the plugin API, so that
prior plugins are no longer effective.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql contrib module

2010-12-29 Thread KaiGai Kohei


(2010/12/30 9:34), Simon Riggs wrote:

On Thu, 2010-12-30 at 09:26 +0900, KaiGai Kohei wrote:


What happens if someone alters the configuration so that the sepgsql
plugin is no longer installed. Does the hidden data become visible?


Yes. If sepgsql plugin is uninstalled, the hidden data become visible.
But no matter. Since only a person who is allowed to edit postgresql.conf
can uninstall it, we cannot uninstall it in run-time.
(An exception is loading a malicious module, but we will be able to
hook this operation in the future version.)


IMHO all security labels should be invisible if the provider is not
installed correctly.


Probably, it needs row-level granularity to control visibility of
each entries of pg_seclabel, because all the provider shares same
system catalog.
So, I don't think this mechanism is feasible right now.


That at least prevents us from accidentally de-installing a module and
having top secret data be widely available.

If you have multiple providers configured, you need to be careful not to
allow a provider that incorrectly implements the plugin API, so that
prior plugins are no longer effective.


Yep. It is responsibility of DBA who tries to set up security providers.
DBA has to install only trustable or well-debugged modules (not limited
to security providers) to avoid troubles.

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-29 Thread Greg Smith


Marko Tiikkaja wrote:
As far as I can tell, this should work.  I played around with the 
patch and the problem seems to be the VALUES:


INTO Stock t
 USING (SELECT 30, 2000) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ;
MERGE 1


Good catch...while I think the VALUES syntax should work, that is a 
useful workaround so I could keep testing.  I rewrote like this 
(original syntax commented out):


MERGE INTO Stock t
-- USING (VALUES(10,100)) AS s(item_id,balance)
USING (SELECT 10,100) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

And that got me back again to concurrent testing.

Moving onto next two problems...the basic MERGE feature seems to have 
stepped backwards a bit too.  I'm now seeing these quite often:


ERROR:  duplicate key value violates unique constraint 
pgbench_accounts_pkey

DETAIL:  Key (aid)=(176641) already exists.
STATEMENT:  MERGE INTO pgbench_accounts t USING (SELECT 176641,1+(176641 
/ 100)::integer,168,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid 
WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT 
MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);


On my concurrent pgbench test, which had been working before.  Possibly 
causing that, the following assertion is tripping:


TRAP: FailedAssertion(!(epqstate-origslot != ((void *)0)), File: 
execMain.c, Line: 1762)


That's coming from the following code:

void
EvalPlanQualFetchRowMarks(EPQState *epqstate)
{
   ListCell   *l;

   Assert(epqstate-origslot != NULL);

   foreach(l, epqstate-rowMarks)


Stepping back to summarize...here's a list of issues I know about with 
the current v204 code:


1) VALUE syntax doesn't work anymore
2) Assertion failure in EvalPlanQualFetchRowMarks
3) Duplicate key bug (possibly a direct result of #3)
4) Attempts to use MERGE in a fuction spit back ERROR:  table is not 
a known fuction
5) The ctid junk attr handling needs to be reviewed more carefully, 
based on author request.


I've attached the current revisions of all my testing code in hopes that 
Boxuan might try and replicate these (this makes it simple to replicate 
#1 through #3), and therefore confirm whether changes made do better.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

DROP TABLE Stock;
CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id; 

MERGE INTO Stock t
-- USING (VALUES(10,100)) AS s(item_id,balance)
 USING (SELECT 10,100) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ; 

SELECT * FROM Stock ORDER BY item_id; 

MERGE INTO Stock t
-- USING (VALUES(30,2000)) AS s(item_id,balance)
 USING (SELECT 30,2000) AS s(item_id,balance)
 ON s.item_id=t.item_id
 WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
 WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
 ; 

SELECT * FROM Stock ORDER BY item_id; 
\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta -5000 5000
MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 100)::integer,:delta,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);

-- This syntax worked with MERGE v203 patch, but isn't compatible with v204
--MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 100)::integer,:delta,'')) AS s(aid,bid,balance,filler) ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);


test-merge.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-29 Thread Marko Tiikkaja


On 2010-12-30 4:39 AM +0200, Greg Smith wrote:

And that got me back again to concurrent testing.

Moving onto next two problems...the basic MERGE feature seems to have
stepped backwards a bit too.  I'm now seeing these quite often:

ERROR:  duplicate key value violates unique constraint
pgbench_accounts_pkey
DETAIL:  Key (aid)=(176641) already exists.
STATEMENT:  MERGE INTO pgbench_accounts t USING (SELECT 176641,1+(176641
/ 100)::integer,168,'') AS s(aid,bid,balance,filler) ON s.aid=t.aid
WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT
MATCHED THEN INSERT VALUES(s.aid,s.bid,s.balance,s.filler);

On my concurrent pgbench test, which had been working before.


I have no idea why it worked in the past, but the patch was never 
designed to work for UPSERT.  This has been discussed in the past and 
some people thought that that's not a huge deal.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-29 Thread Jie Li

On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:
   On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
   It's not a bug, that's the way it currently works. We don't need a
 test
   case for that.
 
   Oh, you're right.  I missed the fact that it's a left join.
 
  The only thing that struck me as curious about it was that the OP didn't
  get a nestloop-with-inner-indexscan plan.  That would be explainable if
  there was no index on the large table's id column ... but columns
  named like that usually have indexes.
 
  I can't get all *that* excited about complicating hash joins as
  proposed.  The query is still fundamentally going to be slow because
  you won't get out of having to seqscan the large table.  The only way
  to make it really fast is to not read all of the large table, and
  nestloop-with-inner-indexscan is the only plan type with a hope of
  doing that.

 Seq scanning the big table isn't bad... we've gone to a lot of trouble
 to make it easy to do this, especially with many users.

 Maintaining many large indexes is definitely bad, all that random I/O is
 going to suck badly.

 Seems like an interesting and relatively optimisation to me. Not sure if
 this is a request for feature, or a proposal to write the optimisation.
 I hope its the latter.


Thanks for your comments.  Yeah I'm excited to write code for PostgreSQL,
but I'm new here
and not familiar with the code routine or patch submission. I will try to
learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any
pgsql-hackers working
on this.

Thanks,
Li Jie

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE