Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Fri, Jul 13, 2012 at 9:41 PM, Peter Eisentraut pete...@gmx.net wrote:

 Personally, I hate this proposed nested directory structure.  I would
 like to have all objects in one directory.

 But there is a lot of personally in this thread, of course.


Why do you hate it?

It's a bit like saying,
 - I hate database normalization, better to keep all rows in one single
table.
or even,
 - I hate directories.

I have thousands of objects, it would be a total mess to keep them all in a
single directory.

Using a normalized directory structure makes sense for the SCM use-case,
I haven't seen any projects where all the files are kept in one directory.


Re: [HACKERS] Schema version management

2012-07-14 Thread Peter Eisentraut
On lör, 2012-07-14 at 10:41 +0200, Joel Jacobson wrote:
 On Fri, Jul 13, 2012 at 9:41 PM, Peter Eisentraut pete...@gmx.net wrote:
 
  Personally, I hate this proposed nested directory structure.  I would
  like to have all objects in one directory.
 
  But there is a lot of personally in this thread, of course.
 
 
 Why do you hate it?
 
 It's a bit like saying,
  - I hate database normalization, better to keep all rows in one single
 table.
 or even,
  - I hate directories.

To a certain extent, yes, I hate (excessive use of) directories.

 I have thousands of objects, it would be a total mess to keep them all in a
 single directory.

Thousands of objects could be a problem, in terms of how the typical
file system tools scale.  But hundreds or a few thousand not
necessarily.  It's easy to browse, filter, and sort using common tools,
for example.

 Using a normalized directory structure makes sense for the SCM use-case,

If there is a theory of normalization for hierarchical databases, I
don't know it but would like to learn about it.

 I haven't seen any projects where all the files are kept in one directory.

Well, of course everyone uses directories in moderation.  But you might
want to take a look at the gcc source code.  You'll love it. ;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut pete...@gmx.net wrote:
 Well, of course everyone uses directories in moderation.  But you might
 want to take a look at the gcc source code.  You'll love it. ;-)

Yes, but GCC was also created by someone who picks stuff from his bare
foot and eats it. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Sat, Jul 14, 2012 at 12:34 PM, Joel Jacobson j...@trustly.com wrote:
 On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut pete...@gmx.net wrote:
 Well, of course everyone uses directories in moderation.  But you might
 want to take a look at the gcc source code.  You'll love it. ;-)

[505][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type d | wc -l
   41895
[506][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type f | wc -l
  167183
[507][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$

Not that bad actually, only 4 files per directory on average.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-13 Thread Peter Eisentraut
On ons, 2012-07-11 at 17:20 -0400, Alvaro Herrera wrote:
 operator_!___numeric.sql (postfix, name does not need escape)
 operator_%7C%2F_integer__.sql (prefix)
 operator_%3C_bit_varying__bit_varying.sql (type name with spaces,
 changed to _)

I'm not sure if it makes things better to escape some operator names and
some not.  It could easily become confusing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-13 Thread Peter Eisentraut
On tor, 2012-07-12 at 16:14 +0200, Joel Jacobson wrote:
 Could work. But I think it's more relevant and useful to keep all objects
 in a schema in its own directory.

Personally, I hate this proposed nested directory structure.  I would
like to have all objects in one directory.

But there is a lot of personally in this thread, of course.

 I think its more common you want to show all objects within schema X
 than show all schemas of type X.

Or maybe it isn't ...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 8:01 PM, Merlin Moncure mmonc...@gmail.com wrote:

 After extensive off-list discussion with Joel it became clear that
 per-object dumping ability really belongs in pg_restore.


The only benefit I could see in putting it in pg_restore is you would then
be able to do a --split on already existing historical dumps.

On the other hand, it would require you to use both pg_dump and pg_restore,
instead of only pg_dump, which makes it a bit less user-friendly.

I haven't looked at how it could be implemented in pg_restore, if its even
just
a little more complex, it's probably better to let pg_dump handle the task.


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 11:20 PM, Alvaro Herrera alvhe...@commandprompt.com
 wrote:

 operator_!___numeric.sql (postfix, name does not need escape)
 operator_%7C%2F_integer__.sql (prefix)
 operator_%3C_bit_varying__bit_varying.sql (type name with spaces,
 changed to _)


I think the directory structure [schema]/[type]/[name] should be the same
for all object types. I don't like operator being part of the filename,
it should be the directory name.


Re: [HACKERS] Schema version management

2012-07-12 Thread Tom Lane
Joel Jacobson j...@trustly.com writes:
 I think the directory structure [schema]/[type]/[name] should be the same
 for all object types. I don't like operator being part of the filename,
 it should be the directory name.

What are you going to do with objects that don't have schemas?
(Including, but not restricted to, the schemas themselves.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Thursday, July 12, 2012, Tom Lane wrote:

 What are you going to do with objects that don't have schemas?
 (Including, but not restricted to, the schemas themselves.)


Good question. Maybe something like this?

For objects without schema:
/global/[type]/[name].sql

For objects with schema:
/schema/[schema]/[type]/[name].sql


Re: [HACKERS] Schema version management

2012-07-12 Thread Tom Lane
Joel Jacobson j...@trustly.com writes:
 On Thursday, July 12, 2012, Tom Lane wrote:
 What are you going to do with objects that don't have schemas?
 (Including, but not restricted to, the schemas themselves.)

 Good question. Maybe something like this?

 For objects without schema:
 /global/[type]/[name].sql

 For objects with schema:
 /schema/[schema]/[type]/[name].sql

FWIW, I think you could save a level of naming if you were willing to
put the type first, since the type would imply whether the object
lives in a schema or not:

[type]/[name].sql
[type]/[schema]/[name].sql

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Thu, Jul 12, 2012 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 FWIW, I think you could save a level of naming if you were willing to
 put the type first, since the type would imply whether the object
 lives in a schema or not:

 [type]/[name].sql
 [type]/[schema]/[name].sql


Could work. But I think it's more relevant and useful to keep all objects
in a schema in its own directory.

That way it's easier to get an overview of what's in a schema,
simply by looking at the file structure of the schema directory.

I think its more common you want to show all objects within schema X
than show all schemas of type X.

PS.

I was thinking -- the guys back in the 70s must have spent a lot of time
thinking about the UNIX directory structure -- before they decided upon it.

I did some googling and found found this explanation which was quite
amusing to say the least :-)

http://lists.busybox.net/pipermail/busybox/2010-December/074114.html


Re: [HACKERS] Schema version management

2012-07-12 Thread Andrew Dunstan


On 07/12/2012 10:01 AM, Tom Lane wrote:


FWIW, I think you could save a level of naming if you were willing to
put the type first, since the type would imply whether the object
lives in a schema or not:

[type]/[name].sql
[type]/[schema]/[name].sql






That will destroy the property of having everything for a given schema 
collected together.


Arguably we should make a special case for the create statement of a 
schema, but I'm not even sure about that.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-11 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 12:24 AM, Merlin Moncure mmonc...@gmail.com wrote:

 What does your patch do that you can't already do with pg_restore?


Please read through the entire thread, think question has already been
answered.


Re: [HACKERS] Schema version management

2012-07-11 Thread Merlin Moncure
On Tue, Jul 10, 2012 at 5:24 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Sun, May 20, 2012 at 2:41 PM, Joel Jacobson j...@trustly.com wrote:
 Hi,

 I just read a very interesting post about schema version management.

 Quote: You could set it up so that every developer gets their own
 test database, sets up the schema there, takes a dump, and checks that
 in. There are going to be problems with that, including that dumps
 produced by pg_dump are ugly and optimized for restoring, not for
 developing with, and they don't have a deterministic output order. (
 http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
 )

 Back in December 2010, I suggested a new option to pg_dump, --split,
 which would write the schema definition of each object in separate
 files:

 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php

 Instead of a huge plain text schema file, impossible to version
 control, all tables/sequences/views/functions are written to separate
 files, allowing the use of a version control software system, such as
 git, to do proper version controlling.

 The deterministic output order problem mentioned in the post above,
 is not a problem if each object (table/sequence/view/function/etc) is
 written to the same filename everytime.
 No matter the order, the tree of files and their content will be
 identical, no matter the order in which they are dumped.

 I remember a lot of hackers were very positive about this option, but
 we somehow failed to agree on the naming of files in the tree
 structure. I'm sure we can work that out though.

 I use this feature in production, I have a cronjob which does a dump
 of the schema every hour, committing any eventual changes to a
 separate git branch for each database installation, such as
 production, development and test.
 If no changes to the schema have been made, nothing will be committed
 to git since none of the files have changed.

 It is then drop-dead simple to diff two different branches of the
 database schema, such as development or production, or diffing
 different revisions allowing point-in-time comparison of the schema.

 This is an example of the otuput of a git log --summary for one of the
 automatic commits to our production database's git-repo:

 --
 commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3
 Author: Production Database production.datab...@trustly.com
 Date:   Fri May 4 15:00:04 2012 +0200

 Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug
 19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200

  create mode 100644
 gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql
  create mode 100644
 gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql
  create mode 100644 
 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql
  create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql
 --

 Here we can see we apparently deployed a new table,
 openingclosingbalances around Fri May 4 15:00:04.

 Without any manual work, I'm able to follow all changes actually
 _deployed_ in each database.

 At my company, a highly database-centric stored-procedure intensive
 business dealing with mission-critical monetary transactions, we've
 been using this technique to successfully do schema version management
 without any hassle for the last two years.

 Hopefully this can add to the list of various possible _useful_ schema
 version management methods.

 What does your patch do that you can't already do with pg_restore?

 create function foo(a int, b int, c text) returns int as $$ select 0;
 $$ language sql;
 CREATE FUNCTION

 pg_dump -Fc postgres -s  postgres.dump
 pg_restore -l postgres.dump  | grep FUNCTION
 196; 1255 32939 FUNCTION public foo(integer, integer, text) merlin

 pg_restore -P foo(integer, integer, text) postgres.dump
 function body follows

 it's fairly easy to wrap pg_restore with a smalls script that extracts
 function bodies and writes them out to file names.  this is a great
 and underused feature, so I'd argue that if you wanted to formalize
 per object file extraction you should be looking at expanding
 pg_restore, not pg_dump.

After extensive off-list discussion with Joel it became clear that
per-object dumping ability really belongs in pg_restore.  It already
has some machinery for that, and has the nice property that you can
pull objects out of dumps long after the fact, not just when the dump
happens.  It then remains to be worked out of pg_restore should be
given the ability to write directly to files as Joel was gunning for
or simply extended to improve upon the current TOC based facilities,
or both.

As noted, choosing a reversible unambiguous filename based on the
database primitive is nastiness of the first order.  For example,
besides the mentioned issues, some filesystems (windows) use case
insensitive entries.  What do you do about that?  Given that all the
reasonable escape characters are 

Re: [HACKERS] Schema version management

2012-07-11 Thread Peter Eisentraut
On tis, 2012-07-10 at 17:54 -0400, Andrew Dunstan wrote:
 In general, NTFS forbids the use of these printable ASCII chars in 
 filenames (see 
 http://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations:
 
  * :   ? \ / |

 Many of these could be used in operators.

Yeah, that's a bummer.  Then I guess some escape mechanism would be OK.
I could imagine an operator  on a custom data type being dumped into a
file named operator_%3C.sql.  Still better than putting them all in one
file.

Of course, argument types need to be dealt with as well, just like with
functions (plus prefix/postfix).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-11 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of mié jul 11 17:03:03 -0400 2012:
 
 On tis, 2012-07-10 at 17:54 -0400, Andrew Dunstan wrote:
  In general, NTFS forbids the use of these printable ASCII chars in 
  filenames (see 
  http://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations:
  
   * :   ? \ / |
 
  Many of these could be used in operators.
 
 Yeah, that's a bummer.  Then I guess some escape mechanism would be OK.
 I could imagine an operator  on a custom data type being dumped into a
 file named operator_%3C.sql.  Still better than putting them all in one
 file.
 
 Of course, argument types need to be dealt with as well, just like with
 functions (plus prefix/postfix).

operator_!___numeric.sql (postfix, name does not need escape)
operator_%7C%2F_integer__.sql (prefix)
operator_%3C_bit_varying__bit_varying.sql (type name with spaces,
changed to _)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-10 Thread Peter Eisentraut
On sön, 2012-07-08 at 18:52 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2012-07-07 at 17:18 -0400, Tom Lane wrote:
  Sure.  You need not look further than / to find an operator name
 that
  absolutely *will* cause trouble if it's dumped into a filename
  literally.
 
  But that problem applies to all object names.
 
 In principle, yes, but in practice it's far more likely that operators
 will have names requiring some sort of encoding than that objects with
 SQL-identifier names will.

I'm not sure.  The only character that's certainly an issue is /.  Are
there any others on file systems that we want to support?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-10 Thread Andrew Dunstan


On 07/10/2012 05:39 PM, Peter Eisentraut wrote:

On sön, 2012-07-08 at 18:52 -0400, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

On lör, 2012-07-07 at 17:18 -0400, Tom Lane wrote:

Sure.  You need not look further than / to find an operator name

that

absolutely *will* cause trouble if it's dumped into a filename
literally.

But that problem applies to all object names.

In principle, yes, but in practice it's far more likely that operators
will have names requiring some sort of encoding than that objects with
SQL-identifier names will.

I'm not sure.  The only character that's certainly an issue is /.  Are
there any others on file systems that we want to support?




In general, NTFS forbids the use of these printable ASCII chars in 
filenames (see 
http://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations:



 * :   ? \ / |


Many of these could be used in operators.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-10 Thread Magnus Hagander
On Tue, Jul 10, 2012 at 11:39 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-07-08 at 18:52 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2012-07-07 at 17:18 -0400, Tom Lane wrote:
  Sure.  You need not look further than / to find an operator name
 that
  absolutely *will* cause trouble if it's dumped into a filename
  literally.

  But that problem applies to all object names.

 In principle, yes, but in practice it's far more likely that operators
 will have names requiring some sort of encoding than that objects with
 SQL-identifier names will.

 I'm not sure.  The only character that's certainly an issue is /.  Are
 there any others on file systems that we want to support?

\ and : if we care at all about windows

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-10 Thread Merlin Moncure
On Sun, May 20, 2012 at 2:41 PM, Joel Jacobson j...@trustly.com wrote:
 Hi,

 I just read a very interesting post about schema version management.

 Quote: You could set it up so that every developer gets their own
 test database, sets up the schema there, takes a dump, and checks that
 in. There are going to be problems with that, including that dumps
 produced by pg_dump are ugly and optimized for restoring, not for
 developing with, and they don't have a deterministic output order. (
 http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
 )

 Back in December 2010, I suggested a new option to pg_dump, --split,
 which would write the schema definition of each object in separate
 files:

 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php

 Instead of a huge plain text schema file, impossible to version
 control, all tables/sequences/views/functions are written to separate
 files, allowing the use of a version control software system, such as
 git, to do proper version controlling.

 The deterministic output order problem mentioned in the post above,
 is not a problem if each object (table/sequence/view/function/etc) is
 written to the same filename everytime.
 No matter the order, the tree of files and their content will be
 identical, no matter the order in which they are dumped.

 I remember a lot of hackers were very positive about this option, but
 we somehow failed to agree on the naming of files in the tree
 structure. I'm sure we can work that out though.

 I use this feature in production, I have a cronjob which does a dump
 of the schema every hour, committing any eventual changes to a
 separate git branch for each database installation, such as
 production, development and test.
 If no changes to the schema have been made, nothing will be committed
 to git since none of the files have changed.

 It is then drop-dead simple to diff two different branches of the
 database schema, such as development or production, or diffing
 different revisions allowing point-in-time comparison of the schema.

 This is an example of the otuput of a git log --summary for one of the
 automatic commits to our production database's git-repo:

 --
 commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3
 Author: Production Database production.datab...@trustly.com
 Date:   Fri May 4 15:00:04 2012 +0200

 Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug
 19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200

  create mode 100644
 gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql
  create mode 100644
 gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql
  create mode 100644 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql
  create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql
 --

 Here we can see we apparently deployed a new table,
 openingclosingbalances around Fri May 4 15:00:04.

 Without any manual work, I'm able to follow all changes actually
 _deployed_ in each database.

 At my company, a highly database-centric stored-procedure intensive
 business dealing with mission-critical monetary transactions, we've
 been using this technique to successfully do schema version management
 without any hassle for the last two years.

 Hopefully this can add to the list of various possible _useful_ schema
 version management methods.

What does your patch do that you can't already do with pg_restore?

create function foo(a int, b int, c text) returns int as $$ select 0;
$$ language sql;
CREATE FUNCTION

pg_dump -Fc postgres -s  postgres.dump
pg_restore -l postgres.dump  | grep FUNCTION
196; 1255 32939 FUNCTION public foo(integer, integer, text) merlin

pg_restore -P foo(integer, integer, text) postgres.dump
function body follows

it's fairly easy to wrap pg_restore with a smalls script that extracts
function bodies and writes them out to file names.  this is a great
and underused feature, so I'd argue that if you wanted to formalize
per object file extraction you should be looking at expanding
pg_restore, not pg_dump.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-08 Thread Joel Jacobson
On Saturday, July 7, 2012, Tom Lane wrote:

 If we think that operators outside of extensions will be an infrequent
 special case, what about just dumping all of them into a single file
 named operators?  And similarly for casts?

 regards, tom lane



+1


Re: [HACKERS] Schema version management

2012-07-08 Thread Peter Eisentraut
On lör, 2012-07-07 at 11:32 -0400, Aidan Van Dyk wrote:
 But, since you're using operators, what would you think is an
 appropriate name for the file the operator is dumped into?

The name of the operator, just like for any other object.  (Assuming
we're using the name of a table for the file for the table etc.)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-08 Thread Peter Eisentraut
On lör, 2012-07-07 at 17:18 -0400, Tom Lane wrote:
 Sure.  You need not look further than / to find an operator name that
 absolutely *will* cause trouble if it's dumped into a filename
 literally.

But that problem applies to all object names.

 If we think that operators outside of extensions will be an infrequent
 special case, what about just dumping all of them into a single file
 named operators?  And similarly for casts?

If we think they are an infrequent case, why make a fuss about it?  Just
treat them like any other object.

In practical terms, I dislike the particular solution proposed here.
For one thing, it would undermine the original purpose of this whole
thread, namely insulating dump output files from ordering differences.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-08 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On lör, 2012-07-07 at 17:18 -0400, Tom Lane wrote:
 Sure.  You need not look further than / to find an operator name that
 absolutely *will* cause trouble if it's dumped into a filename
 literally.

 But that problem applies to all object names.

In principle, yes, but in practice it's far more likely that operators
will have names requiring some sort of encoding than that objects with
SQL-identifier names will.

 If we think that operators outside of extensions will be an infrequent
 special case, what about just dumping all of them into a single file
 named operators?  And similarly for casts?

 If we think they are an infrequent case, why make a fuss about it?  Just
 treat them like any other object.

 In practical terms, I dislike the particular solution proposed here.
 For one thing, it would undermine the original purpose of this whole
 thread, namely insulating dump output files from ordering differences.

That's a good point.  However, I think that there are no cases where
we'd have dependencies between operators (or between casts), so that
as long as the initial sort is well-defined for them, it shouldn't
really be an issue in practice.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-07 Thread Aidan Van Dyk
On Fri, Jul 6, 2012 at 4:50 PM, Peter Eisentraut pete...@gmx.net wrote:

 I have code in the wild that defines new operators and casts and has no
 C code and is not in an extension and has no business being in an
 extension.

Nobody is claiming that pgdump shouldn't dump it.

But, since you're using operators, what would you think is an
appropriate name for the file the operator is dumped into?

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-07 Thread Alvaro Herrera

Excerpts from Aidan Van Dyk's message of sáb jul 07 11:32:33 -0400 2012:
 
 On Fri, Jul 6, 2012 at 4:50 PM, Peter Eisentraut pete...@gmx.net wrote:
 
  I have code in the wild that defines new operators and casts and has no
  C code and is not in an extension and has no business being in an
  extension.
 
 Nobody is claiming that pgdump shouldn't dump it.
 
 But, since you're using operators, what would you think is an
 appropriate name for the file the operator is dumped into?

I was thinking that it might make sense to group operators according to
the type(s) they operate on, somehow.  Using funny chars for names is
guaranteed to cause problems somewhere.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-07 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Aidan Van Dyk's message of sáb jul 07 11:32:33 -0400 2012:
 But, since you're using operators, what would you think is an
 appropriate name for the file the operator is dumped into?

 I was thinking that it might make sense to group operators according to
 the type(s) they operate on, somehow.  Using funny chars for names is
 guaranteed to cause problems somewhere.

Sure.  You need not look further than / to find an operator name that
absolutely *will* cause trouble if it's dumped into a filename
literally.

I'm not especially thrilled by the idea of using url-encoding or
something like that for operator names, though.  Seems like it loses on
readability.

If we think that operators outside of extensions will be an infrequent
special case, what about just dumping all of them into a single file
named operators?  And similarly for casts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-06 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 This argument seems a bit irrelevant to me.  pg_dump doesn't get to pick
 and choose what will be in the database it's told to dump.  If we're

Sure.

 going to do something like what Joel wants, we have to have file naming
 conventions for operator and cast objects.  So we can't just leave them
 out of the conversation (or if we do, we shouldn't be surprised when the
 ensuing design sucks).

I guess what we're saying is that at this point we can pick non user
friendly naming rules, like pg_operator/oid.sql or something like
that, OID based. Impacted users might as well learn about extensions.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-06 Thread Robert Haas
On Fri, Jul 6, 2012 at 8:23 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 This argument seems a bit irrelevant to me.  pg_dump doesn't get to pick
 and choose what will be in the database it's told to dump.  If we're

 Sure.

 going to do something like what Joel wants, we have to have file naming
 conventions for operator and cast objects.  So we can't just leave them
 out of the conversation (or if we do, we shouldn't be surprised when the
 ensuing design sucks).

 I guess what we're saying is that at this point we can pick non user
 friendly naming rules, like pg_operator/oid.sql or something like
 that, OID based. Impacted users might as well learn about extensions.

I think that would defeat some of the human-readability goals that
people have for this feature, not to mention that it would lose the
ability to do diff -r between a dump produced on cluster A and a dump
produced on cluster B.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-06 Thread Marc Mamin
Hello,

Is it imaginable to additionally generate an index file that map the
half friendly file names to a cleartext object signature ?

This would allow user to possibly postprocess the output while merging
overloaded functions to single files or renaming the files according to
their needs and preferences...

best regards,

Marc Mamin



 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Freitag, 6. Juli 2012 15:02
 To: Dimitri Fontaine
 Cc: Tom Lane; Christopher Browne; Pg Hackers
 Subject: Re: [HACKERS] Schema version management
 
 On Fri, Jul 6, 2012 at 8:23 AM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
  Tom Lane t...@sss.pgh.pa.us writes:
  This argument seems a bit irrelevant to me.  pg_dump doesn't get to
 pick
  and choose what will be in the database it's told to dump.  If
we're
 
  Sure.
 
  going to do something like what Joel wants, we have to have file
 naming
  conventions for operator and cast objects.  So we can't just leave
 them
  out of the conversation (or if we do, we shouldn't be surprised
when
 the
  ensuing design sucks).
 
  I guess what we're saying is that at this point we can pick non user
  friendly naming rules, like pg_operator/oid.sql or something like
  that, OID based. Impacted users might as well learn about
extensions.
 
 I think that would defeat some of the human-readability goals that
 people have for this feature, not to mention that it would lose the
 ability to do diff -r between a dump produced on cluster A and a dump
 produced on cluster B.
 
 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-06 Thread Peter Eisentraut
On tor, 2012-07-05 at 23:52 +0200, Dimitri Fontaine wrote:
 I would argue like Álvaro that when dealing with operators and casts
 you're probably writing an extension already, and we're providing
 another way to deal with that.

I have code in the wild that defines new operators and casts and has no
C code and is not in an extension and has no business being in an
extension.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote:

 My vote is - when there's an overloaded function, put each version in
 its own file.  And name the files something like
 functionname_something.sql.  And just document that something may not
 be entirely stable.


I would agree that's better if the dump order isn't deterministic.

However, it looks like an easy fix to make the dump order deterministic:
http://archives.postgresql.org/pgsql-hackers/2012-07/msg00232.php

If the dump order is deterministic, I think its cleaner to put all versions
in the same file.

Benefits:
+ Pretty looking filename
+ Same file structure for all object types, no special exception for
functions


Re: [HACKERS] Schema version management

2012-07-05 Thread Gurjeet Singh
On Thu, Jul 5, 2012 at 3:15 AM, Joel Jacobson j...@trustly.com wrote:

 On Thu, Jul 5, 2012 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote:

 My vote is - when there's an overloaded function, put each version in
 its own file.  And name the files something like
 functionname_something.sql.  And just document that something may not
 be entirely stable.


 I would agree that's better if the dump order isn't deterministic.

 However, it looks like an easy fix to make the dump order deterministic:
 http://archives.postgresql.org/pgsql-hackers/2012-07/msg00232.php

 If the dump order is deterministic, I think its cleaner to put all
 versions in the same file.

 Benefits:
 + Pretty looking filename
 + Same file structure for all object types, no special exception for
 functions


I think there's a merit to keeping all overloaded variations of a function
in a single file, apart from the simplicity and benefits noted above. A
change in one variation of the function may also be applicable to other
variations, say in bug-fixes or enhancements. So keeping all variations in
one file would make sense, since it is logically one object.

Best regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Schema version management

2012-07-05 Thread Andrew Dunstan
On Thu, Jul 5, 2012 at 4:04 AM, Gurjeet Singh singh.gurj...@gmail.comwrote:

 On Thu, Jul 5, 2012 at 3:15 AM, Joel Jacobson j...@trustly.com wrote:

 On Thu, Jul 5, 2012 at 2:38 AM, Robert Haas robertmh...@gmail.comwrote:

 My vote is - when there's an overloaded function, put each version in
 its own file.  And name the files something like
 functionname_something.sql.  And just document that something may not
 be entirely stable.


 I would agree that's better if the dump order isn't deterministic.

 However, it looks like an easy fix to make the dump order deterministic:
 http://archives.postgresql.org/pgsql-hackers/2012-07/msg00232.php

 If the dump order is deterministic, I think its cleaner to put all
 versions in the same file.

 Benefits:
 + Pretty looking filename
 + Same file structure for all object types, no special exception for
 functions


 I think there's a merit to keeping all overloaded variations of a function
 in a single file, apart from the simplicity and benefits noted above. A
 change in one variation of the function may also be applicable to other
 variations, say in bug-fixes or enhancements. So keeping all variations in
 one file would make sense, since it is logically one object.



No they are not necessarily one logical unit. You could have a bunch of
functions called, say, equal which have pretty much nothing to do with
each other, since they refer to different types.

+1 from me for putting one function definition per file.

cheers

andrew


Re: [HACKERS] Schema version management

2012-07-05 Thread Michael Glaesemann

On Jul 5, 2012, at 9:21, Andrew Dunstan wrote:

 No they are not necessarily one logical unit. You could have a bunch of
 functions called, say, equal which have pretty much nothing to do with
 each other, since they refer to different types.
 
 +1 from me for putting one function definition per file.

+1. It might make sense to include some sort of argument type information. The 
function signature is
really its identifier. The function name is only part of it.

Michael Glaesemann
grzm seespotcode net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Vik Reykja
On Thu, Jul 5, 2012 at 3:32 PM, Michael Glaesemann g...@seespotcode.netwrote:


 On Jul 5, 2012, at 9:21, Andrew Dunstan wrote:

  No they are not necessarily one logical unit. You could have a bunch of
  functions called, say, equal which have pretty much nothing to do with
  each other, since they refer to different types.
 
  +1 from me for putting one function definition per file.

 +1. It might make sense to include some sort of argument type information.
 The function signature is
 really its identifier. The function name is only part of it.


I'll go against the flow here.  I would prefer to have all overloaded
functions in the same file.


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
Maybe it could be made an option to pg_dump?

Some users and their systems might not even have overloaded functions,
and these users will of course prefer a nice looking filename, i.e. all
functions
having the same name kept in the same file. Which for them will mean only
one function per file anyway.

pg_dump --split --overloaded-functions-to-same-file

Other users and their systems might have a lot of overloaded functions,
like the equal() example mentioned, they will of course prefer to keep
all functions in separate files.

pg_dump --split --overloaded-functions-to-separate-files

Then, one can discuss which one should be the default option for --split,
I would prefer the same file variant, and think most other users would too,
except for users with a lot of overloaded functions.


Re: [HACKERS] Schema version management

2012-07-05 Thread Tom Lane
Joel Jacobson j...@trustly.com writes:
 Maybe it could be made an option to pg_dump?

Ick.  Then we have to deal with all the downsides of *both* methods.

pg_dump is already a bloated, nearly unmaintainable mess.  The very
last thing it needs is even more options.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ick.  Then we have to deal with all the downsides of *both* methods.

 pg_dump is already a bloated, nearly unmaintainable mess.  The very
 last thing it needs is even more options.


When you say bloated, are you referring to the code or the command line
interface?

If you are referring to the code, I don't think that's a good argument
against implementing new good features.
The important ratio is the value of a feature compared to the increased
complexity.
In this case, it's very simple to implement both the --split option and the
fixing of dump order.
I'm not even a C coder and managed to implement it within less of an hour
effective coding.
We are talking ~100 lines of code, with comments and everything.

If you are referring to the command line interface and think it is bloated,
maybe the options should be hidden in the normal --help.
We could create a new --help-advanced text, where we could put these
options, and all other existing less common options.
I think this is a quite common and good way to handle the situation for
UNIX command line tools.


Re: [HACKERS] Schema version management

2012-07-05 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue jul 05 10:46:52 -0400 2012:
 Joel Jacobson j...@trustly.com writes:
  Maybe it could be made an option to pg_dump?
 
 Ick.  Then we have to deal with all the downsides of *both* methods.
 
 pg_dump is already a bloated, nearly unmaintainable mess.  The very
 last thing it needs is even more options.

Agreed.

However I am also against what seems to be the flow.  Normally, you
don't write overloaded plpgsql functions such as equal.  Case in
point, the equality functions in core have funny names like int4eq and
so on.  Instead, at least in my experience, the overloaded functions
people seem to have in their databases are like do_stuff_to_foobars()
and you have one version for foos and another one for bars.

If you're doing lots of equality functions, surely it would make more
sense to package them up as an extension anyway along with all the other
thingies you need for the type you're supposedly writing.  So it's a
completely different market than what we're aiming at here.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 5:17 PM, Alvaro Herrera
alvhe...@commandprompt.comwrote:

 Agreed.

 However I am also against what seems to be the flow.  Normally, you
 don't write overloaded plpgsql functions such as equal.  Case in
 point, the equality functions in core have funny names like int4eq and
 so on.  Instead, at least in my experience, the overloaded functions
 people seem to have in their databases are like do_stuff_to_foobars()
 and you have one version for foos and another one for bars.

 If you're doing lots of equality functions, surely it would make more
 sense to package them up as an extension anyway along with all the other
 thingies you need for the type you're supposedly writing.  So it's a
 completely different market than what we're aiming at here.


True, very true, I didn't think about that, you are right, I fully agree.
My vote is therefore on the put all overloaded functions in the same file
variant.


Re: [HACKERS] Schema version management

2012-07-05 Thread Michael Glaesemann

On Jul 5, 2012, at 11:17, Alvaro Herrera wrote:

 
 Excerpts from Tom Lane's message of jue jul 05 10:46:52 -0400 2012:
 Joel Jacobson j...@trustly.com writes:
 Maybe it could be made an option to pg_dump?
 
 Ick.  Then we have to deal with all the downsides of *both* methods.
 
 pg_dump is already a bloated, nearly unmaintainable mess.  The very
 last thing it needs is even more options.
 
 Agreed.
 
 However I am also against what seems to be the flow.  Normally, you
 don't write overloaded plpgsql functions such as equal.

I often write functions that perform fetches based on different criteria.
For example, 

-- returns count of all orders for the given customer
int function order_count(in_customer_name text)

-- returns count of all orders for the given customer since the given timestamp
int function order_count(in_customer_name text, in_since timestamp with time 
zone)

-- returns count of orders for the given customer during a given interval
int function order_count(in_customer_name text, in_from timestamp with time 
zone, in_through timestamp with time zone) 


Or, I'll write overloaded functions, one of which provides default values.

-- returns the set of members whose birthday is today. Calls 
birthday_members(CURRENT_DATE)
setof record function birthday_members()

-- returns the set of members whose birthday is on the given date, which makes 
testing a lot easier
setof record function birthday_members(in_date DATE)

Some may disagree that this is a proper usage of function overloading.
Some may even argue that function names shouldn't be overloaded at all.
However, I find this usage of function name overloading useful, especially
for keeping function names relatively short.

If we're dumping objects (tables, views, functions, what-have-you) into 
separate files,
each of these functions is a separate object and should be in its own file.

Michael Glaesemann
grzm seespotcode net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Alvaro Herrera

Excerpts from Michael Glaesemann's message of jue jul 05 11:36:51 -0400 2012:
 
 On Jul 5, 2012, at 11:17, Alvaro Herrera wrote:

  However I am also against what seems to be the flow.  Normally, you
  don't write overloaded plpgsql functions such as equal.
 
 I often write functions that perform fetches based on different criteria.
 For example, 
 
 -- returns count of all orders for the given customer
 int function order_count(in_customer_name text)
 
 -- returns count of all orders for the given customer since the given 
 timestamp
 int function order_count(in_customer_name text, in_since timestamp with time 
 zone)
 
 -- returns count of orders for the given customer during a given interval
 int function order_count(in_customer_name text, in_from timestamp with time 
 zone, in_through timestamp with time zone) 

Isn't this a perfect example of stuff that, since it does much the same
thing, should be in the same file so that you remember to fix them all
together if you find a bug in one?

 Or, I'll write overloaded functions, one of which provides default values.
 
 -- returns the set of members whose birthday is today. Calls 
 birthday_members(CURRENT_DATE)
 setof record function birthday_members()
 
 -- returns the set of members whose birthday is on the given date, which 
 makes testing a lot easier
 setof record function birthday_members(in_date DATE)

Same.  This seems particularly the case if one implementation calls
another, more general one.

 Some may disagree that this is a proper usage of function overloading.
 Some may even argue that function names shouldn't be overloaded at all.
 However, I find this usage of function name overloading useful, especially
 for keeping function names relatively short.

I completely agree.

 If we're dumping objects (tables, views, functions, what-have-you) into 
 separate files,
 each of these functions is a separate object and should be in its own file.

Clearly there is no consensus here.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Michael Glaesemann

On Jul 5, 2012, at 11:52, Alvaro Herrera wrote:

 Isn't this a perfect example of stuff that, since it does much the same
 thing, should be in the same file so that you remember to fix them all
 together if you find a bug in one?

That's what tests are for.

Michael Glaesemann
grzm seespotcode net




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Michael Glaesemann's message of jue jul 05 11:36:51 -0400 2012:
 If we're dumping objects (tables, views, functions, what-have-you) into 
 separate files,
 each of these functions is a separate object and should be in its own file.

 Clearly there is no consensus here.

FWIW, I'm attracted to the all-similarly-named-functions-together
method, mainly because it dodges the problem of how to encode a
function's argument list into a filename.  However, we're being
short-sighted to only think of functions here.  What about operators?
Or casts?  Those don't have simple names either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 5:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.


Someone suggested to urlencode them. I think that's a quite good solution.

Personally, I don't have any user-defined operators or casts. Don't know
how common it is in general, but it must of course work for these as well.


Re: [HACKERS] Schema version management

2012-07-05 Thread Tom Lane
Joel Jacobson j...@trustly.com writes:
 On Thu, Jul 5, 2012 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 pg_dump is already a bloated, nearly unmaintainable mess.  The very
 last thing it needs is even more options.

 If you are referring to the code, I don't think that's a good argument
 against implementing new good features.
 The important ratio is the value of a feature compared to the increased
 complexity.

Well, to be perfectly frank, I already doubt that this entire feature
passes the complexity-versus-value test, because pg_dump is not a
substitute for an SCM --- people who have got enough functions to need
this sort of thing need to be keeping them somewhere else than in dump
files.  Complicating things more by supporting multiple ways of doing it
will make that worse.  I think you need to pick one design and stick
with it, not try to paint the bikeshed every color suggested by anybody.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Christopher Browne
On Thu, Jul 5, 2012 at 11:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Michael Glaesemann's message of jue jul 05 11:36:51 -0400 2012:
 If we're dumping objects (tables, views, functions, what-have-you) into 
 separate files,
 each of these functions is a separate object and should be in its own file.

 Clearly there is no consensus here.

 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.

If you stow them all together, that still leaves a question as to
whether or not they get stored in a deterministic order.

I was recently working on something of the same issue as part of a
schema differencing tool.  It was pointedly *not* sufficient to use
the internal name (e.g. - information_schema.routines.specific_name),
as I wanted to compare things between databases, and it's pretty
certain that oids will differ.

I wound up expanding the function arguments and using function + args
as the name.  That leads to a risk of rather long names for functions,
but there aren't many other ways possible.

Note that pg_autodoc http://www.rbt.ca/autodoc/ takes a similar
approach; it attaches function labels based on function + args.

Here's an expanded example in the Slony docs:

http://slony.info/documentation/2.1/function.ddlscript-prepare-int-p-only-on-node-integer-p-set-id-integer.html

I wouldn't mind stowing functions together in one file, and I'd
actually not get too bent out of shape if the order was somewhat
nondeterministic.  But something like the autodoc naming seems like
the unambiguous answer.  Long, but unambiguous...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Well, to be perfectly frank, I already doubt that this entire feature
 passes the complexity-versus-value test, because pg_dump is not a
 substitute for an SCM --- people who have got enough functions to need
 this sort of thing need to be keeping them somewhere else than in dump
 files.  Complicating things more by supporting multiple ways of doing it
 will make that worse.  I think you need to pick one design and stick
 with it, not try to paint the bikeshed every color suggested by anybody.


I agree it should be one option only, and again I think the one file
variant is best.

This is indeed not a substitute for an SCM, but a nice complement.

Personally, I use this feature already to commit the schema for all versions
of my databases (production, test, development) into a git repo every
minute.
It only commits if something has changed.

This makes it super easy to compare the schema of the actual production
database
between different points in time.

This would not be possible if only manually committing stuff to the normal
git repo,
where I also have all the functions, which I modify when developing and
testing.

pg_dump - git means you can be 100% certain version X of the schema was
active in the production database at date/time T.


Re: [HACKERS] Schema version management

2012-07-05 Thread Alvaro Herrera

Excerpts from Christopher Browne's message of jue jul 05 12:10:09 -0400 2012:

 I wound up expanding the function arguments and using function + args
 as the name.  That leads to a risk of rather long names for functions,
 but there aren't many other ways possible.

Well, maybe not many, but you don't need many, only some.  You could
stringify the list of arguments and use a hash of the string.  That's
also unambiguous and the length is constrained, regardless of the number
of args.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread David E. Wheeler
On Jul 5, 2012, at 3:21 PM, Andrew Dunstan wrote:

 No they are not necessarily one logical unit. You could have a bunch of
 functions called, say, equal which have pretty much nothing to do with
 each other, since they refer to different types.
 
 +1 from me for putting one function definition per file.

+1 for an option (I prefer one file for my projects, but might need multiple 
files for other projects).

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Aidan Van Dyk
On Thu, Jul 5, 2012 at 12:57 PM, David E. Wheeler da...@justatheory.com wrote:
 On Jul 5, 2012, at 3:21 PM, Andrew Dunstan wrote:

 No they are not necessarily one logical unit. You could have a bunch of
 functions called, say, equal which have pretty much nothing to do with
 each other, since they refer to different types.

 +1 from me for putting one function definition per file.

 +1 for an option (I prefer one file for my projects, but might need multiple 
 files for other projects).

-1

I'd rather have the few overloaded-functions in one file (hopefully
with deterministic ordering) and a sane, simple filename, than have
every function in every database in a separate file with some strange
mess in the filename that makes me cringe every time I see it.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Josh Berkus

 I'd rather have the few overloaded-functions in one file (hopefully
 with deterministic ordering) and a sane, simple filename, than have
 every function in every database in a separate file with some strange
 mess in the filename that makes me cringe every time I see it.

Having tried it both ways, for an existing production project, I agree
with Adrian.  One file, multiple functions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 However I am also against what seems to be the flow.  Normally, you
 don't write overloaded plpgsql functions such as equal.  Case in
 point, the equality functions in core have funny names like int4eq and
 so on.  Instead, at least in my experience, the overloaded functions
 people seem to have in their databases are like do_stuff_to_foobars()
 and you have one version for foos and another one for bars.

+1

I too want to have my overloaded functions all in the same file, as much
as to have made that the only behavior in getddl.py:

  https://github.com/dimitri/getddl

 If you're doing lots of equality functions, surely it would make more
 sense to package them up as an extension anyway along with all the other
 thingies you need for the type you're supposedly writing.  So it's a
 completely different market than what we're aiming at here.

+1

Tom Lane t...@sss.pgh.pa.us writes:
 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.

I would argue like Álvaro that when dealing with operators and casts
you're probably writing an extension already, and we're providing
another way to deal with that.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Christopher Browne
On Thu, Jul 5, 2012 at 5:52 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 However I am also against what seems to be the flow.  Normally, you
 don't write overloaded plpgsql functions such as equal.  Case in
 point, the equality functions in core have funny names like int4eq and
 so on.  Instead, at least in my experience, the overloaded functions
 people seem to have in their databases are like do_stuff_to_foobars()
 and you have one version for foos and another one for bars.

 +1

 I too want to have my overloaded functions all in the same file, as much
 as to have made that the only behavior in getddl.py:

That seems pretty appropriate to me.

The converse makes my head hurt...

If I have a bunch of overloaded functions, whose definitions *aren't*
really related, are we competing for the obfuscated PostgreSQL
contest?

In practice, that sounds like something I'd want to add to my list of
fire people that do this! Bad Practices.

 If you're doing lots of equality functions, surely it would make more
 sense to package them up as an extension anyway along with all the other
 thingies you need for the type you're supposedly writing.  So it's a
 completely different market than what we're aiming at here.

 +1
+1

 Tom Lane t...@sss.pgh.pa.us writes:
 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.

 I would argue like Álvaro that when dealing with operators and casts
 you're probably writing an extension already, and we're providing
 another way to deal with that.

Indeed.  Is this something we ought to document as a recommendation?
It's not exactly reference material, but if it's a good practice,
perhaps it should be in the manuals somewhere...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Tom Lane
Christopher Browne cbbro...@gmail.com writes:
 On Thu, Jul 5, 2012 at 5:52 PM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.

 I would argue like Álvaro that when dealing with operators and casts
 you're probably writing an extension already, and we're providing
 another way to deal with that.

 Indeed.  Is this something we ought to document as a recommendation?

This argument seems a bit irrelevant to me.  pg_dump doesn't get to pick
and choose what will be in the database it's told to dump.  If we're
going to do something like what Joel wants, we have to have file naming
conventions for operator and cast objects.  So we can't just leave them
out of the conversation (or if we do, we shouldn't be surprised when the
ensuing design sucks).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-04 Thread Robert Haas
On Wed, Jul 4, 2012 at 9:02 AM, Joel Jacobson j...@trustly.com wrote:
 On Tue, Jul 3, 2012 at 7:49 PM, Peter Eisentraut pete...@gmx.net wrote:

 I think this idea has merit.  Prepare a patch and put it into the next
 commit fest.

 Glad to hear, I'm on it!


 I see the problem that since the dump order is in general not
 deterministic, this will cause random reordering in your master file
 that includes all the individual files.


 Then again, making the dump order deterministic is a problem that can be
 solved (I suppose), so maybe starting there would be a good step.  But
 it will require a small amount of in-depth pg_dump hacking.


 I just made a test, where I created objects in different order and compared
 the dumps.
 It appears pg_dump dumps objects in alphabetically sorted order.
 This works fine for most objects, but not for overloaded functions, in which
 case
 they are dumped in oid order.

 Are there any other cases than overloaded functions, where the dump order
 isn't deterministic?

 While waiting for your reply, I'll be working on fixing the problem with
 overloaded functions.

My vote is - when there's an overloaded function, put each version in
its own file.  And name the files something like
functionname_something.sql.  And just document that something may not
be entirely stable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-04 Thread Joel Jacobson
On Tue, Jul 3, 2012 at 7:49 PM, Peter Eisentraut pete...@gmx.net wrote:

 I think this idea has merit.  Prepare a patch and put it into the next
 commit fest.


Glad to hear, I'm on it!


 I see the problem that since the dump order is in general not
 deterministic, this will cause random reordering in your master file
 that includes all the individual files.


 Then again, making the dump order deterministic is a problem that can be
 solved (I suppose), so maybe starting there would be a good step.  But
 it will require a small amount of in-depth pg_dump hacking.


I just made a test, where I created objects in different order and compared
the dumps.
It appears pg_dump dumps objects in alphabetically sorted order.
This works fine for most objects, but not for overloaded functions, in
which case
they are dumped in oid order.

Are there any other cases than overloaded functions, where the dump order
isn't deterministic?

While waiting for your reply, I'll be working on fixing the problem with
overloaded functions.


Re: [HACKERS] Schema version management

2012-07-03 Thread Peter Eisentraut
On ons, 2012-06-27 at 10:02 +0200, Joel Jacobson wrote:
 Robert, thank you for keeping this thread alive.

 Hopefully some more will join the discussion.

 I'm still hopeful the community can manage to agree upon acceptable
 tradeoffs and work-arounds to make this possible.

I think this idea has merit.  Prepare a patch and put it into the next
commit fest.
 
 I think the benefits clearly outweighs the minor issues of filenames,
 dumping order, etc.

I see the problem that since the dump order is in general not
deterministic, this will cause random reordering in your master file
that includes all the individual files.

Then again, making the dump order deterministic is a problem that can be
solved (I suppose), so maybe starting there would be a good step.  But
it will require a small amount of in-depth pg_dump hacking.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-06-27 Thread Joel Jacobson
Robert, thank you for keeping this thread alive.

Hopefully some more will join the discussion.

I'm still hopeful the community can manage to agree upon acceptable
tradeoffs and work-arounds to make this possible.

I think the benefits clearly outweighs the minor issues of filenames,
dumping order, etc.

On Tue, Jun 26, 2012 at 6:04 PM, Robert Haas robertmh...@gmail.com wrote:

 I don't think either of these problems ought to be a complete
 show-stopper.  It seems to me that the trade-off is that when object
 names are long, contain special characters, or are overloaded, we'll
 have to munge the names in some way to prevent collisions.  That could
 mean that the names are not 100% stable, which would possibly produce
 some annoyance if you're using a VCS to track changes, but maybe
 that's an acceptable trade-off, because it shouldn't happen very
 often.  If we could guararantee that identifiers less than 64
 characters which are not overloaded and contain no special characters
 requiring quoting end up in an eponymous file, I think that would be
 good enough to make most of our users pretty happy.  In other cases, I
 think the point would just be to make it work (with a funny name)
 rather than fail.


I agree. It's not a problem if the filename is not identical to the name of
the object, as long as the same name generates the same filename on
all architectures. Url escaping would work, but converting all non-ascii
characters to ascii would be nicer, and dropping any problematic characters,
or replacing them with _ or any other suitable character.

For the small fraction of users how have managed to find a good reason
to name a function this/is\a/good.name/of/a\function.. the filename
of such a function would be this_is_a_good_name_of_a_function.

As long as the objects are dumped in the same order, there will be no
merge problems when two developers commit changes of the same
file. I think pg_dump does a reasonable job already making sure the order is
always the same. How big is the problem, really?

It would of course be a little easier to keep track of changes and do
merging
if all overloaded functions would be kept in separate files, but I see that
as a
minor feature request. As long as all objects with the same name are kept in
separate files, that's good enough for my needs, and I have _a lot_ of
functions,
whereof quite a few are overloaded.




  \i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql
  \i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql

 It would be better to use \ir here rather than hard-code path names, I
 think.  Then you'd only need to require that all the files be in the
 same directory, rather than requiring them to be at a certain
 hard-coded place in the filesystem.


I fully agree!
I didn't know about the \ir feature.

Best regards,

Joel Jacobson


Re: [HACKERS] Schema version management

2012-06-26 Thread Robert Haas
On Tue, May 22, 2012 at 11:31 PM, Joel Jacobson j...@trustly.com wrote:
 This is true, which means some users won't be able to use the feature,
 because they are using an ancient OS or have function names with slashes,
 hm, is it even possible to have function names with slashes?

Sure.  If you quote the function name, you can put anything you want
in there.  Note that Windows disallows a whole bunch of special
characters in filenames, while UNIX-like systems tend to disallow only
slash.

 I suppose you have a lot more experience of what postgres installations exists
 in the world. Do you think it's common databases have non-ascii problematic
 characters in object names?

 Is it a project policy all features of all standard tools must be
 useful for all users
 on all platforms on all databases? Or is it acceptable if some features are 
 only
 useable for, say, 90% of the users?

There are cases where we permit features that only work on some
platforms, but it's rare.  Usually, we do this only when the platform
lacks some API that exists elsewhere.  For example, collations and
prefetching are not supported on Windows because the UNIX APIs we use
don't exist there.

In this case, it seems like you could work around the problem by, say,
URL-escaping any characters that can't be used in an unquoted
identifier.  Of course that might make the file name long enough to
hit the platform-specific file name limit.  Not sure what to do about
that.  The basic idea you're proposing here has been proposed a number
of times before, but it's always fallen down over questions of (1)
what do do with very long object names or those containing special
characters and (2) objects (like functions) for which schema+name is
not a unique identifier.

I don't think either of these problems ought to be a complete
show-stopper.  It seems to me that the trade-off is that when object
names are long, contain special characters, or are overloaded, we'll
have to munge the names in some way to prevent collisions.  That could
mean that the names are not 100% stable, which would possibly produce
some annoyance if you're using a VCS to track changes, but maybe
that's an acceptable trade-off, because it shouldn't happen very
often.  If we could guararantee that identifiers less than 64
characters which are not overloaded and contain no special characters
requiring quoting end up in an eponymous file, I think that would be
good enough to make most of our users pretty happy.  In other cases, I
think the point would just be to make it work (with a funny name)
rather than fail.

 \i /home/postgres/database/gluepay-split/public/TYPE/dblink_pkey_results.sql
 \i /home/postgres/database/gluepay-split/public/TYPE/r_matchedwithdrawal.sql
 \i 
 /home/postgres/database/gluepay-split/public/TYPE/r_unapprovedwithdrawal.sql
 \i 
 /home/postgres/database/gluepay-split/public/TYPE/ukaccountvalidationchecktype.sql
 \i /home/postgres/database/gluepay-split/aml/FUNCTION/check_name.sql
 \i /home/postgres/database/gluepay-split/aml/FUNCTION/describe_entityid.sql
 \i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql
 \i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql
 -- ... all the objects ..
 \i 
 /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerid_fkey.sql
 \i 
 /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerstatusid_fkey.sql
 \i 
 /home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workertypeid_fkey.sql

It would be better to use \ir here rather than hard-code path names, I
think.  Then you'd only need to require that all the files be in the
same directory, rather than requiring them to be at a certain
hard-coded place in the filesystem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-06-01 Thread Kevin Grittner
Joel Jacobson  wrote:
 
 hm, is it even possible to have function names with slashes?
 
test=# create function is/it\even
possible?() returns void language plpgsql as $$begin end;$$;
CREATE FUNCTION
test=# select is/it\even
possible?();
 is/it\even+
  possible?   
--
 
(1 row)
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-22 Thread Daniel Farina
On Mon, May 21, 2012 at 5:25 PM, Joel Jacobson j...@trustly.com wrote:
 If one want to reuse the splitting to files-code of the directory
 format, maybe the existing option -F d could be tweaked to output in
 both a a machine-readable format (current way), and also a
 human-friendly tree of files and content (like suggested by my patch).

 I wonder what the option would be called then, having two chars
 options is not an option I guess, maybe -F t for tree instead of
 directory, as the -F d option only dumps to a single directory and
 not a tree-structure?

Is there a reason why the current directory format could not be
adjusted to become more human-readable friendly for mechanical
reasons?  I realize there is a backwards compatibility problem, but it
may be better than bloating a new option.

Andrew's approach of reading the TOC also be good...as so pg_dump can
avoid serving the additional master of schema versioning and
development usability in addition to dumping.  The TOC is the closest
thing we have to the library-ification of pg_dump in the near-term.
But I don't see how making the directory output format more
human-friendly could be seen as a bad thing overall, except in the
notable axis of implementation complexity.  Silly issues like naming
files on different platforms, case sensitivity, and file length
restrictions may rear their ugly head.

I think about this because in addition to the data types and operators
defined in the development process, there are often small tables that
need to be loaded with content and version controlled as well, rather
like userland-equivalents pg_enum entries.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-22 Thread Joel Jacobson
On Wed, May 23, 2012 at 3:24 AM, Daniel Farina dan...@heroku.com wrote:
 Is there a reason why the current directory format could not be
 adjusted to become more human-readable friendly for mechanical
 reasons?  I realize there is a backwards compatibility problem, but it
 may be better than bloating a new option.

I like your idea, then the format would be directory, while the option
would be something like, --human-friendly?

Currently, the directory format only dumps the data of tables
into separate files. Everything else goes into the toc.dat file.
To make it work, also the stuff written to the toc.dat file must
be written to separate files.

 But I don't see how making the directory output format more
 human-friendly could be seen as a bad thing overall, except in the
 notable axis of implementation complexity.  Silly issues like naming
 files on different platforms, case sensitivity, and file length
 restrictions may rear their ugly head.

If the entire function identity arguments would be included in the filename,
two dumps of the same schema in two different databases
would be guaranteed to produce the same dump.

This would render some very long filenames for functions with many arguments,
but this problem could at least be reduced by using the shorter aliases for each
data type, as varchar instead of character varying and timestamptz
instead of timestamp with time zone, etc.

http://www.postgresql.org/docs/devel/static/datatype.html#DATATYPE-TABLE

Also, to get even more space, as the name of the function can be long too,
the function name could be made a directory, and the different overloaded types
different files, e.g:

/public/FUNCTION/myfunc/int.sql
/public/FUNCTION/myfunc/int_timestamptz.sql

And functions with no arguments are written to a single file
(suffic .sql to avoid conflict with eventual directory name for function):

/public/FUNCTION/myfunc.sql

 I think about this because in addition to the data types and operators
 defined in the development process, there are often small tables that
 need to be loaded with content and version controlled as well, rather
 like userland-equivalents pg_enum entries.

Is there a term for such tables? I use the term lookup tables, but perhaps
there is a better one?

In my schema, they typically maps statusids, stateids,
etc to human friendly names.

E.g., if Orders is a huge table for all orders, I might have a
OrderStatuses table to
lookup all the OrderStatusID columns in Orders.
Orders.OrderStatusID -fk- OrderStatuses.OrderStatusID
OrderStatuses.Name is unqiue and contains the human friendly name of the status.

These small lookup tables also needs to be version controlled of course.

This is a tricky one though, because you might have small tables with base data,
but with references to other huge tables, which you don't want to
include in your
automatically version controlled schema dump.

I solved this problem by creating a quite complex recursive plpgsql function,
resolving all dependencies and joining only the rows from each table required,
allowing you to specify a regex matching a list of tables, which in
turn resolves
to all tables they have references to, and dumps these tables too, but only the
required rows. The result is a dump of each such table into a separate file,
in a restorable order not causing any dependency problems.
Then I have a similar function to do the restoring.

I use this approach to build a restorable clean test database of any version of
the system, may it be the production or some developer's local version of it.

And also, not to forget, to make it work all the sequences also needs
to be restarted
to the same values as in the original database after the dump is restored.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-22 Thread Tom Lane
Joel Jacobson j...@trustly.com writes:
 If the entire function identity arguments would be included in the filename,
 two dumps of the same schema in two different databases
 would be guaranteed to produce the same dump.

 This would render some very long filenames for functions with many arguments,

Thus, not implausibly, causing the dump to fail entirely on some
filesystems.  Case sensitivity, encoding issues, and special characters
in names (eg slashes or backslashes, depending on platform) are
additional pain points.  This does not sound like a good plan from here.

Taking a step or two back, it seems to me that the thrust of your
proposal is essentially to throw away all dump ordering information,
which does not seem like a particularly good idea either.  It certainly
will not lead to a dump that can be restored reliably.  If the use-case
for this is database comparisons, I think we'd be a lot better off to
write a postprocessing tool for regular dumps to perform such
comparisons, rather than whacking pg_dump around to the point where it's
unable to perform its primary function.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-22 Thread Joel Jacobson
On Wed, May 23, 2012 at 9:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Thus, not implausibly, causing the dump to fail entirely on some
 filesystems.  Case sensitivity, encoding issues, and special characters
 in names (eg slashes or backslashes, depending on platform) are
 additional pain points.  This does not sound like a good plan from here.

This is true, which means some users won't be able to use the feature,
because they are using an ancient OS or have function names with slashes,
hm, is it even possible to have function names with slashes?

The maximum length of tables, functions etc in postgres is 63 characters.
A function in postgres can have at most 100 arguments.
The absolute majority of users run operating systems allowing
at least 255 characters, http://en.wikipedia.org/wiki/Comparison_of_file_systems

I suppose you have a lot more experience of what postgres installations exists
in the world. Do you think it's common databases have non-ascii problematic
characters in object names?

Is it a project policy all features of all standard tools must be
useful for all users
on all platforms on all databases? Or is it acceptable if some features are only
useable for, say, 90% of the users?

 Taking a step or two back, it seems to me that the thrust of your
 proposal is essentially to throw away all dump ordering information,
 which does not seem like a particularly good idea either.  It certainly
 will not lead to a dump that can be restored reliably.  If the use-case
 for this is database comparisons, I think we'd be a lot better off to
 write a postprocessing tool for regular dumps to perform such
 comparisons, rather than whacking pg_dump around to the point where it's
 unable to perform its primary function.

Not at all, the ordering information is not thrown away, it is preserved
in the dump file specified by the -f option, from which each split file
is included using \i


Example, this is an extract of the -f dump file in my database:

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'SQL_ASCII';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

--
-- Name: pgx_diag; Type: SCHEMA; Schema: -; Owner: postgres
--

CREATE SCHEMA pgx_diag;


ALTER SCHEMA pgx_diag OWNER TO postgres;

-- ... some more schemas, languages etc ...
-- ... and then all the included files:

\i /home/postgres/database/gluepay-split/public/TYPE/dblink_pkey_results.sql
\i /home/postgres/database/gluepay-split/public/TYPE/r_matchedwithdrawal.sql
\i /home/postgres/database/gluepay-split/public/TYPE/r_unapprovedwithdrawal.sql
\i 
/home/postgres/database/gluepay-split/public/TYPE/ukaccountvalidationchecktype.sql
\i /home/postgres/database/gluepay-split/aml/FUNCTION/check_name.sql
\i /home/postgres/database/gluepay-split/aml/FUNCTION/describe_entityid.sql
\i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql
\i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql
-- ... all the objects ..
\i 
/home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerid_fkey.sql
\i 
/home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workerstatusid_fkey.sql
\i 
/home/postgres/database/gluepay-split/public/FK_CONSTRAINT/workershistory_workertypeid_fkey.sql


-- .. and after all the included files comes permissions and stuff:

--
-- Name: public; Type: ACL; Schema: -; Owner: postgres
--

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-22 Thread Joel Jacobson
On the topic on fixing pg_dump to dump in a predictable order, can
someone please update me on the current state of the problem?

I've read though pg_dump_sort.c, and note objects are first sorted in
type/name-based ordering, then topologically sorted in a way which
minimize unnecessary rearrangement.

How come this not always generates a predictable order? Any ideas on
how to fix the problem? If someone gives me a hint I might make an
effort trying to implement the idea.

If pg_dump would dump in a predictable order, it would make sense to
dump all overloaded versions of functions sharing the same name in the
same file.

Then it would be _guaranteed_ two different databases committing their
schema to a shared VCS commit exactly the same files if the schema is
the same, which is not guaranteed unless the dump order is
predictable.

Having thought about it, I agree the idea with arguments in filenames
is, probably possible, but suboptimal.
Much better writing all overloaded functions to the same file and
fixing the predictable dump order problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-21 Thread Benedikt Grundmann
On Mon, May 21, 2012 at 5:03 AM, Joel Jacobson j...@trustly.com wrote:

 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02301.php
 The initial feedback was on the usage of OIDs as file names.
 This was indeed a bad idea and was changed, see
 http://archives.postgresql.org/pgsql-hackers/2010-12/msg02318.php
 Gurjeet Singh pointed out the problem with functions sharing the same
 name but having different arguments.
 As of now, it's not certain they will always be dumped into the same files.
 This is a valid point, and needs to be solved in an elegant way.
 The arguments needs to be made part of the path somehow.

This is interesting at Jane Street we actually have a small tool
that parses the output of pg_dump.  (Well applies a set of regular
expressions plus a little bit guesswork).  We use this to do three things
all of which I would love to see supported by postgres tool chain proper:

1) split the output into one file per thing (basically as per this
   thread) each file named type_name_running-integer for use
   with a VCS.  So if we have an overloaded function foo we end up with
   several function_foo_1.sql function_foo_2.sql ...  The order of the
   enumeration is just the order the functions occurred in the pg_dump
   which seems to be stable and therefore good enough.

2) extract a patch.  You give the tool the name of one or more roots
   (e.g. a table or set of tables you want to modify).  It finds all
   things that depend on it (well sort of just turn the body of each
   definition into a list of words and a depends on b if the name of b
   occurrs in a).  Do a topological sort (if there are cycles because
   of the hack dependency check break them but continue and produce a
   warning). Output a file that first drops the definitions in inverse
   dependency order and then recreates them (in dependency order).
   The file starts with a begin but does NOT end with a commit so you
   are forced to enter it yourself.

   This tool is fantastic if you have a big set of plpgsql functions as
   it is otherwise hard to make sure that you have modified all places
   when refactoring, changing a column, etc...

3) Find all leaves.  E.g. do the topsort on the whole pg_dump and list
   the names of all things nothing depends on.  This is mostly useful if
   you want to make sure that you are not accumulating cruft that isn't
   used by anything.  Of course you separately need a list or knowledge
   about the entry points of your application(s).

Cheers,

Bene

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-21 Thread Daniel Farina
On Sun, May 20, 2012 at 9:03 PM, Joel Jacobson j...@trustly.com wrote:
 On Mon, May 21, 2012 at 10:06 AM, Daniel Farina dan...@heroku.com wrote:
 Also, now that I look more carefully, there was a lot of conversation
 about this patch; it seems like what you are doing now is reporting
 its successful use, and I did not understand that by reading the
 abstract of your email.  And, beyond that, do we have a summary of the
 open questions that prevented it from being committed?

 Good idea. Here is an attempt to a summary:

Thank you, that's very informative.  I'd like to reiterate one
question, though, which is something like:

How do you feel that the since-committed directory-output/input
support in pg_dump/pg_restore could or should influence your patch, if
at all?

It seems like now that there is support for spitting out a bunch of
files in a directory for pg_dump that is now going to be supported for
a long time that a new feature like yours might be more cohesive if it
somehow played with that.  I must confess I haven't read the patch in
detail, especially if it has been updated, but back then there was no
multi-file output mode from pg_dump, and now there is one.  My naive
understanding is this would be adding a second one as-is, but I wonder
if that is strictly necessary to fulfill the use case.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-21 Thread Joel Jacobson
On Tue, May 22, 2012 at 3:30 AM, Daniel Farina dan...@heroku.com wrote:
 Thank you, that's very informative.  I'd like to reiterate one
 question, though, which is something like:

 How do you feel that the since-committed directory-output/input
 support in pg_dump/pg_restore could or should influence your patch, if
 at all?

The directory format fulfills a different purpose. The tables are
split into files, where each file name gets a number. Functions are
not split into files, they are defined in the table of content file,
toc.dat.

Example:

joel@Joel-Jacobsons-MacBook-Pro ~ $ pg_dump -F d -f /Users/joel/test
joel@Joel-Jacobsons-MacBook-Pro ~ $ ls -la test
total 24
drwx--5 joel  staff   170 May 22 07:16 .
drwx--+ 130 joel  staff  4488 May 22 07:16 ..
-rw-r--r--1 joel  staff38 May 22 07:16 2116.dat.gz
-rw-r--r--1 joel  staff39 May 22 07:16 2117.dat.gz
-rw-r--r--1 joel  staff  2265 May 22 07:16 toc.dat

This is a good feature for its purpose, but doesn't provide a solution
for the schema version management problem.

 It seems like now that there is support for spitting out a bunch of
 files in a directory for pg_dump that is now going to be supported for
 a long time that a new feature like yours might be more cohesive if it
 somehow played with that.  I must confess I haven't read the patch in
 detail, especially if it has been updated, but back then there was no
 multi-file output mode from pg_dump, and now there is one.  My naive
 understanding is this would be adding a second one as-is, but I wonder
 if that is strictly necessary to fulfill the use case.

If one want to reuse the splitting to files-code of the directory
format, maybe the existing option -F d could be tweaked to output in
both a a machine-readable format (current way), and also a
human-friendly tree of files and content (like suggested by my patch).

I wonder what the option would be called then, having two chars
options is not an option I guess, maybe -F t for tree instead of
directory, as the -F d option only dumps to a single directory and
not a tree-structure?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-21 Thread Andrew Dunstan



On 05/21/2012 08:25 PM, Joel Jacobson wrote:

On Tue, May 22, 2012 at 3:30 AM, Daniel Farinadan...@heroku.com  wrote:

Thank you, that's very informative.  I'd like to reiterate one
question, though, which is something like:

How do you feel that the since-committed directory-output/input
support in pg_dump/pg_restore could or should influence your patch, if
at all?

The directory format fulfills a different purpose. The tables are
split into files, where each file name gets a number. Functions are
not split into files, they are defined in the table of content file,
toc.dat.

Example:

joel@Joel-Jacobsons-MacBook-Pro ~ $ pg_dump -F d -f /Users/joel/test
joel@Joel-Jacobsons-MacBook-Pro ~ $ ls -la test
total 24
drwx--5 joel  staff   170 May 22 07:16 .
drwx--+ 130 joel  staff  4488 May 22 07:16 ..
-rw-r--r--1 joel  staff38 May 22 07:16 2116.dat.gz
-rw-r--r--1 joel  staff39 May 22 07:16 2117.dat.gz
-rw-r--r--1 joel  staff  2265 May 22 07:16 toc.dat

This is a good feature for its purpose, but doesn't provide a solution
for the schema version management problem.


It seems like now that there is support for spitting out a bunch of
files in a directory for pg_dump that is now going to be supported for
a long time that a new feature like yours might be more cohesive if it
somehow played with that.  I must confess I haven't read the patch in
detail, especially if it has been updated, but back then there was no
multi-file output mode from pg_dump, and now there is one.  My naive
understanding is this would be adding a second one as-is, but I wonder
if that is strictly necessary to fulfill the use case.

If one want to reuse the splitting to files-code of the directory
format, maybe the existing option -F d could be tweaked to output in
both a a machine-readable format (current way), and also a
human-friendly tree of files and content (like suggested by my patch).

I wonder what the option would be called then, having two chars
options is not an option I guess, maybe -F t for tree instead of
directory, as the -F d option only dumps to a single directory and
not a tree-structure?




I have a little utility to write out the data in the TOC as separate 
files. It works with both a custom format dump, and the toc.dat file 
from a directory format or unpacked tar format dump.


It was originally written as a debugging aid, but  can be used to some 
extent for schema management as well. It's far from complete, but still 
might be useful. See https://github.com/adunstan/DumpToc


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-20 Thread Daniel Farina
On Sun, May 20, 2012 at 12:41 PM, Joel Jacobson j...@trustly.com wrote:
 Hi,

 I just read a very interesting post about schema version management.

 Quote: You could set it up so that every developer gets their own
 test database, sets up the schema there, takes a dump, and checks that
 in. There are going to be problems with that, including that dumps
 produced by pg_dump are ugly and optimized for restoring, not for
 developing with, and they don't have a deterministic output order. (
 http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
 )

I think you are absolutely right, but I'm not sure if teaching pg_dump
a new option is the best idea.  It's a pretty complex program as-is.
I've also heard some people who really wish pg knew how to self-dump
for valid reasons.

It sounds like some of the catalog wrangling and cycle-breaking
properties of pg_dump could benefit from being exposed stand-alone,
but unfortunately that's not a simple task, especially if you want to
do The Right Thing and have pg_dump link that code, given pg_dump's
criticality.

pg_extractor is a new/alternative take on the database copying
problem, maybe you could have a look at that?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-20 Thread Joel Jacobson
On Mon, May 21, 2012 at 8:08 AM, Daniel Farina dan...@heroku.com wrote:
 I think you are absolutely right, but I'm not sure if teaching pg_dump
 a new option is the best idea.  It's a pretty complex program as-is.
 I've also heard some people who really wish pg knew how to self-dump
 for valid reasons.

Complex program? Yes, pg_dump it is extremely complex, I wouldn't want
to touch any of the code. A rewrite is probably close to impossible.

Complex patch? No. It's 102 lines of code and doesn't change any of
the existing code in pg_dump, it simply adds some lines writing out
the objects to separate files. Have a look at the patch, it's super
simple.

 It sounds like some of the catalog wrangling and cycle-breaking
 properties of pg_dump could benefit from being exposed stand-alone,
 but unfortunately that's not a simple task, especially if you want to
 do The Right Thing and have pg_dump link that code, given pg_dump's
 criticality.

I agree it's not a simple task, and it's probably not something anyone
will fix in the near future.
The --split option doesn't aim to solve this problem either. That's a
different problem, and it's not a problem I have.

 pg_extractor is a new/alternative take on the database copying
 problem, maybe you could have a look at that?

It's just sad realizing people need to some up with hacks and
work-arounds to solve a obvious real-life problem, easily fixed inside
pg_dump with 102 lines of drop-dead simple code, not touching any of
the logics or flows in pg_dump.

I can't even image how many hours coders have wasted hacking together
tools like pg_extractor just to circumvent the stupid fact pg_dump
can't do this natively.

The pg_extractor is way more complex than my suggested patch, it's 974
lines of perl codes, as opposed to 102 lines of simple code in the
patch.
The pg_extractor also does a lot more than simply splitting objects
into separate files, like executing svn commands.

The splitting of objects into separate files should clearly be the
responsibility of pg_dump.
It would allow you to easily version control the schema files your
self with any version control software system, such as svn, git, etc.

I'm sure pg_extractor does it best to achieve the objective, but even
if it does, I would never trust it for production usage, version
controlling your production schema is far too important to trust any
tool not part of the mainline distribution of postgres. And personally
I don't have any problem, I've been using the --split option for two
years, I just feel sorry for the rest of the postgres community,
unaware of how to solve this problem, having to hack together their
own little tools, or be lucky finding some existing hack.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-20 Thread Daniel Farina
On Sun, May 20, 2012 at 7:36 PM, Joel Jacobson j...@trustly.com wrote:
 On Mon, May 21, 2012 at 8:08 AM, Daniel Farina dan...@heroku.com wrote:
 I think you are absolutely right, but I'm not sure if teaching pg_dump
 a new option is the best idea.  It's a pretty complex program as-is.
 I've also heard some people who really wish pg knew how to self-dump
 for valid reasons.

 Complex program? Yes, pg_dump it is extremely complex, I wouldn't want
 to touch any of the code. A rewrite is probably close to impossible.

I wouldn't be so sure about that...

 Complex patch? No. It's 102 lines of code and doesn't change any of
 the existing code in pg_dump, it simply adds some lines writing out
 the objects to separate files. Have a look at the patch, it's super
 simple.

Ah. I did not know there was a patch already out there -- I did not
somehow get that , as it then can be audited in its precise functionality.

 It sounds like some of the catalog wrangling and cycle-breaking
 properties of pg_dump could benefit from being exposed stand-alone,
 but unfortunately that's not a simple task, especially if you want to
 do The Right Thing and have pg_dump link that code, given pg_dump's
 criticality.

 It's just sad realizing people need to some up with hacks and
 work-arounds to solve a obvious real-life problem, easily fixed inside
 pg_dump with 102 lines of drop-dead simple code, not touching any of
 the logics or flows in pg_dump.

 I can't even image how many hours coders have wasted hacking together
 tools like pg_extractor just to circumvent the stupid fact pg_dump
 can't do this natively.

My next question would be how this might relate to the directory dump
format.  For example, is it an embellishment of that?  It seems at
fist glance that whatever this patch might be a cousin of that
feature.  Or, is it superseded? The documentation is clear that tables
are given their own files, but doesn't say much about how other schema
objects are stored, so they may or may not be useful to your needs.

Also, now that I look more carefully, there was a lot of conversation
about this patch; it seems like what you are doing now is reporting
its successful use, and I did not understand that by reading the
abstract of your email.  And, beyond that, do we have a summary of the
open questions that prevented it from being committed?

 I'm sure pg_extractor does it best to achieve the objective, but even
 if it does, I would never trust it for production usage, version
 controlling your production schema is far too important to trust any
 tool not part of the mainline distribution of postgres. And personally
 I don't have any problem, I've been using the --split option for two
 years, I just feel sorry for the rest of the postgres community,
 unaware of how to solve this problem, having to hack together their
 own little tools, or be lucky finding some existing hack.

My thinking is that confidence would be increased if there was a piece
of code that handled a lot of the catalog munging et al that is part
of pg_dump that *is* maintained by postgres so other projects can more
convincingly add a correct veneer.

As a meta-comment, all I did was ask some polite questions.  You could
have politely disqualified pg_extractor and spared some of the
language without having gotten anything less done.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-05-20 Thread Joel Jacobson
On Mon, May 21, 2012 at 10:06 AM, Daniel Farina dan...@heroku.com wrote:
 Also, now that I look more carefully, there was a lot of conversation
 about this patch; it seems like what you are doing now is reporting
 its successful use, and I did not understand that by reading the
 abstract of your email.  And, beyond that, do we have a summary of the
 open questions that prevented it from being committed?

Good idea. Here is an attempt to a summary:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02301.php
The initial feedback was on the usage of OIDs as file names.
This was indeed a bad idea and was changed, see
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02314.php
Tom Lane also pointed out it doesn't solve the randomly different
ordering of rows within a table-problem.
The rows within a table are not part of the schema. The patch doesn't
attempt to solve that problem.

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02318.php
Gurjeet Singh pointed out the problem with functions sharing the same
name but having different arguments.
As of now, it's not certain they will always be dumped into the same files.
This is a valid point, and needs to be solved in an elegant way.
The arguments needs to be made part of the path somehow.

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02327.php
Another idea Gurjeet had was to hash the object identifier and use
that in the file's name.
Not a bad idea, would look nicer if functions have very many arguments.
Perhaps /[schema]/[object type]/[name]/[hash of arguments].sql

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02329.php
David Wilson suggested placing all overloaded functions within the same file.
Then, assuming you order them deterministically within that file, we
sidestep the
file naming issue and maintain useful diff capabilities, since a diff of the
function's file will show additions or removals of various overloaded
versions.
This would be a good idea, but falls on pg_dump not outputting the
functions in a deterministic order.

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02496.php
Robert Treat: I've both enjoyed reading this thread and seeing this wheel
reinvented yet again, and wholeheartedly +1 the idea of building this
directly into pg_dump. (The only thing better would be to make everything
thing sql callable, but that's a problem for another day).

 My thinking is that confidence would be increased if there was a piece
 of code that handled a lot of the catalog munging et al that is part
 of pg_dump that *is* maintained by postgres so other projects can more
 convincingly add a correct veneer.

I totally agree, the most easy place to handle it is within pg_dump.

 As a meta-comment, all I did was ask some polite questions.  You could
 have politely disqualified pg_extractor and spared some of the
 language without having gotten anything less done.

I very much appreciated your answer, and I did in no way mean to be impolite.

Best regards,

Joel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers