Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-21 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Jan 20, 2011 at 4:40 PM, Simone Aiken
 sai...@quietlycompetent.com wrote:
  After playing with this in benchmarks and researching the weird results I
  got I'm going to advise dropping the todo for now unless something happens
  to change how postgres handles clustering.
 
 I agree, let's remove it.
 
 That having been said, analyzing TODO items to figure out which ones
 are worthless is a useful thing to do, so please feel free to keep at
 it.

OK, removed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Robert Haas
On Wed, Jan 19, 2011 at 4:27 PM, Simone Aiken sai...@ulfheim.net wrote:
 In my experience size increases related to documentation are almost always
 worth it.  So I'm prejudiced right out of the gate.  I was wondering if
 every pg_ table gets copied out to every database ..  if there is already a
 mechanism for not replicating all of them we could utilize views or
 re-writes rules to merge a single copy of catalog comments in a separate
 table with each deployed database's pg_descriptions.

All of them get copied, except for a handful of so-called shared
catalogs.  Changing that would be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Simone Aiken
After playing with this in benchmarks and researching the weird results I
got I'm going to advise dropping the todo for now unless something happens
to change how postgres handles clustering.  You guys probably already
grokked this so I am just recording it for the list archives.  
  

The primary factor here is that postgres doesn't maintain clustered indexes.
Clustering is a one-time operation that clusters the table at this current
point in time.  Basically, there really isn't any such thing in postgres as
a clustered index.  There is an operation - Cluster - which takes an index
and a table as input and re-orders the table according to the index.   But
it is borderline fiction to call the index used clustered because the next
row inserted will pop in at the end of the table instead of slipping into
the middle of the table per the desired ordering.  

All the pg_table cluster candidates are candidates because they have a row
per table column and we expect that a query will want to get several of
these rows at once.  These rows are naturally clustered because the scripts
that create them insert their information into the catalog contiguously.
When you create a catalog table the pg_attribute rows for its columns are
inserted together.  When you then create all its triggers they too are put
into pg_triggers one after the other.  So calling the Cluster operation
after dbinit doesn't help anything.

Over time table alterations can fragment this information.   If a user loads
a bunch of tables, then alters them over time the columns added later on
will have their metadata stored separately from the columns created
originally. 

Which gets us to the down and dirty of how the Cluster function works.  It
puts an access exclusive lock on the entire table - blocking all attempts to
read and write to the table - creates a copy of the table in the desired
order, drops the original, and renames the copy.  Doing this to a catalog
table that is relevant to queries pretty much brings everything else in the
database to a halt while the system table is locked up.  And the brute force
logic makes this time consuming even if the table is perfectly ordered
already.  Additionally, snapshots taken of the table during the Cluster
operation make the table appear to be empty which introduces the possibility
of system table corruption if transactions are run concurrently with a
Cluster operation.

So basically, the Cluster operation in its current form is not something you
want running automatically on a bunch of system table as it is currently
implemented.  It gives your system the hiccups.  You would only want to run
it manually during downtime.  And you can do that just as easily with or
without any preparation during dbinit.


Thanks everyone,

-Simone Aiken





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Robert Haas
On Thu, Jan 20, 2011 at 4:40 PM, Simone Aiken
sai...@quietlycompetent.com wrote:
 After playing with this in benchmarks and researching the weird results I
 got I'm going to advise dropping the todo for now unless something happens
 to change how postgres handles clustering.

I agree, let's remove it.

That having been said, analyzing TODO items to figure out which ones
are worthless is a useful thing to do, so please feel free to keep at
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Simone Aiken

 Robert
 
 I think the first 
 thing to do would be to try to come up with a reproducible test case 
 where clustering the tables improves performance.  


On that note, is there any standard way you guys do benchmarks?  


 Bruce

I think CLUSTER is a win when you are looking up multiple rows in the same
table, either using a non-unique index or a range search.  What places do
such lookups?  Having them all in adjacent pages would be a win ---
single-row lookups are usually not.


Mostly the tables that track column level data.  Typically you will want to
grab rows for multiple columns for a given table at once so it would be
helpful to have them be contiguous on disk. 

I could design a benchmark to display this by building a thousand tables one
column at a time using 'alter add column' to scatter the catalog rows for
the tables across many blocks.  So they'll be a range with column 1 for each
table and column 2 for each table and column three for each table.  Then
fill a couple data tables with a lot of data and set some noise makers to
loop through them over and over with full table scans ... filling up cache
with unrelated data and hopefully ageing out the cache of the pg_tables.
Then do some benchmark index lookup queries to see the retrieval time before
and after clustering the pg_ctalog tables to record a difference.

If the criteria is doesn't hurt anything and helps a little I think this
passes.  Esp since clusters aren't maintained automatically so adding them
has no negative impact on insert or update.  It'd just be a nice thing to do
if you know it can be done that doesn't harm anyone who doesn't know.


-Simone Aiken





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:

 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

Oh, I see.  I don't think we want to go there.  We'd need some kind of
system for keeping the two places in sync.  And there'd be no easy way
to upgrade the in-database descriptions when we upgraded to a newer
minor release, supposing they'd changed in the meantime.  And some of
the descriptions are quite long, so they wouldn't fit nicely in the
amount of space you typically have available when you run \d+.  And it
would enlarge the size of an empty database by however much was
required to store all those comments, which could be an issue for
PostgreSQL instances that have many small databases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
 sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:
 
 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

I seem to recall some muttering about teaching genbki to extract such
comments from the SGML sources or perhaps the C header files.  I tend to
agree though that it would be a lot more work than it's worth.  And as
you say, pg_description entries aren't free.

Which brings up another point though.  I have a personal TODO item to
make the comments for operator support functions more consistent:
http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
Should we consider removing those comments altogether, instead?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mié ene 19 15:25:00 -0300 2011:

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

Maybe autogenerate both the .sgml and the postgres.description files
from a single source.

 And there'd be no easy way
 to upgrade the in-database descriptions when we upgraded to a newer
 minor release, supposing they'd changed in the meantime.

I wouldn't worry about this issue.  We don't do many catalog changes in
minor releases anyway.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 18, 2011 at 6:49 PM, Simone Aiken
 sai...@quietlycompetent.com wrote:
 Pages like this one have column comments for the system tables:

 http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

 Oh, I see.  I don't think we want to go there.  We'd need some kind of
 system for keeping the two places in sync.

 I seem to recall some muttering about teaching genbki to extract such
 comments from the SGML sources or perhaps the C header files.  I tend to
 agree though that it would be a lot more work than it's worth.  And as
 you say, pg_description entries aren't free.

 Which brings up another point though.  I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

I could go either way on that.  Most of those comments are pretty
short, aren't they?  How much storage are they really costing us?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Which brings up another point though. I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

 I could go either way on that.  Most of those comments are pretty
 short, aren't they?  How much storage are they really costing us?

Well, on my machine pg_description is about 210K (per database) as of
HEAD.  90% of its contents are pg_proc entries, though I have no good
fix on how much of that is for internal-use-only functions.  A very
rough estimate from counting pg_proc and pg_operator entries suggests
that the answer might be about a third.  So if we do what was said in
the above-cited thread, ie move existing comments to pg_operator and
add boilerplate ones to pg_proc, we probably would pay 100K for it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Robert Haas
On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Which brings up another point though. I have a personal TODO item to
 make the comments for operator support functions more consistent:
 http://archives.postgresql.org/message-id/21407.1287157...@sss.pgh.pa.us
 Should we consider removing those comments altogether, instead?

 I could go either way on that.  Most of those comments are pretty
 short, aren't they?  How much storage are they really costing us?

 Well, on my machine pg_description is about 210K (per database) as of
 HEAD.  90% of its contents are pg_proc entries, though I have no good
 fix on how much of that is for internal-use-only functions.  A very
 rough estimate from counting pg_proc and pg_operator entries suggests
 that the answer might be about a third.  So if we do what was said in
 the above-cited thread, ie move existing comments to pg_operator and
 add boilerplate ones to pg_proc, we probably would pay 100K for it.

I guess that's not enormously expensive, but it's not insignificant
either.  On my machine, a template database is 5.5MB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jan 19, 2011 at 3:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, on my machine pg_description is about 210K (per database) as of
 HEAD.  90% of its contents are pg_proc entries, though I have no good
 fix on how much of that is for internal-use-only functions.  A very
 rough estimate from counting pg_proc and pg_operator entries suggests
 that the answer might be about a third.  So if we do what was said in
 the above-cited thread, ie move existing comments to pg_operator and
 add boilerplate ones to pg_proc, we probably would pay 100K for it.

 I guess that's not enormously expensive, but it's not insignificant
 either.  On my machine, a template database is 5.5MB.

The implementation I was thinking about was to have initdb run a SQL
command that would do something like

INSERT INTO pg_description
  SELECT oprcode, 'pg_proc'::regclass, 0, 'implementation of ' || oprname
  FROM pg_operator
  WHERE theres-not-already-a-description-of-the-oprcode-function

So it would be minimal work to either provide or omit the boilerplate
descriptions.  I think we can postpone the decision till we have a
closer fix on the number of entries we're talking about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-19 Thread Simone Aiken


I seem to recall some muttering about teaching genbki to extract such
comments from the SGML sources or perhaps the C header files.  I tend to
agree though that it would be a lot more work than it's worth.  And as you
say, pg_description entries aren't free.


I know I can't do all of the work, any submission requires review etc, but
it is worth it to me provided it does no harm to the codebase.

So the only outstanding question is the impact of increased size.

In my experience size increases related to documentation are almost always
worth it.  So I'm prejudiced right out of the gate.  I was wondering if
every pg_ table gets copied out to every database ..  if there is already a
mechanism for not replicating all of them we could utilize views or
re-writes rules to merge a single copy of catalog comments in a separate
table with each deployed database's pg_descriptions.  

If all catalog descriptions were handled this way it would actually decrease
the size of a deployed database ( by 210K? ) by absorbing the
pg_descriptions that are currently being duplicated.   Since users shouldn't
be messing with them anyway and they are purely for humans to refer to - not
computers to calculate explain plans with -  there shouldn't be anything
inherently wrong with moving static descriptions out of user space.  In
theory at least.  


-Simone Aiken




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Alvaro Herrera
Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:
 
 Hello Postgres Hackers,
 
 In reference to this todo item about clustering system table indexes, 
   
 ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) 
 I have been studying the system tables to see which would benefit  from 
 clustering.  I have some index suggestions and a question if you have a 
 moment.

Wow, this is really old stuff.  I don't know if this is really of any
benefit, given that these catalogs are loaded into syscaches anyway.
Furthermore, if you cluster at initdb time, they will soon lose the
ordering, given that updates move tuples around and inserts put them
anywhere.  So you'd need the catalogs to be re-clustered once in a
while, and I don't see how you'd do that (except by asking the user to
do it, which doesn't sound so great).

I think you need some more discussion on the operational details before
engaging in the bootstrap bison stuff (unless you just want to play with
Bison for educational purposes, of course, which is always a good thing
to do).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Alvaro Herrera
Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:
 
 Hello Postgres Hackers,

BTW whatever you do, don't start a new thread by replying to an existing
message and just changing the subject line.  It will mess up the
threading for some readers, and some might not even see your message.
Compose a fresh message instead.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:

 Hello Postgres Hackers,

 In reference to this todo item about clustering system table indexes,
 ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php )
 I have been studying the system tables to see which would benefit  from
 clustering.  I have some index suggestions and a question if you have a
 moment.

 Wow, this is really old stuff.  I don't know if this is really of any
 benefit, given that these catalogs are loaded into syscaches anyway.
 Furthermore, if you cluster at initdb time, they will soon lose the
 ordering, given that updates move tuples around and inserts put them
 anywhere.  So you'd need the catalogs to be re-clustered once in a
 while, and I don't see how you'd do that (except by asking the user to
 do it, which doesn't sound so great).

The idea of the TODO seems to have been to set the default clustering
to something reasonable.  That doesn't necessarily seem like a bad
idea even if we can't automatically maintain the cluster order, but
there's some question in my mind whether we'd get any measurable
benefit from the clustering.  Even on a database with a gigantic
number of tables, it seems likely that the relevant system catalogs
will stay fully cached and, as you point out, the system caches will
further blunt the impact of any work in this area.  I think the first
thing to do would be to try to come up with a reproducible test case
where clustering the tables improves performance.  If we can't, that
might mean it's time to remove this TODO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Simone Aiken

On Jan 18, 2011, at 6:35 AM, Alvaro Herrera wrote:
 
 
 Wow, this is really old stuff.  I don't know if this is really of any
 benefit, given that these catalogs are loaded into syscaches anyway.


The benefit is educational primarily.  I was looking for a todo list item
that would expose me to the system tables.  Learning the data model
of a new system is always step 1 for me.  So that one was perfect as
it would have me study and consider each one to determine if there
was any benefit from clustering on its initial load into cache.  


 Furthermore, if you cluster at initdb time, they will soon lose the
 ordering, given that updates move tuples around and inserts put them
 anywhere.  So you'd need the catalogs to be re-clustered once in a
 while, and I don't see how you'd do that (except by asking the user to
 do it, which doesn't sound so great).


I did discover that last night.  I'm used to databases that keep up their
clustering.  One that falls apart over time is distinctly strange.  And the
way you guys do your re-clustering logic is overkill if just a few rows
are out of place.  On the upside, a call to mass re-clustering goes
and updates all the clustered indexes in the system and that includes
these tables.  Will have to study auto-vacuum as well to consider that.


  (unless you just want to play with
 Bison for educational purposes, of course, which is always a good thing
 to do).

Pretty much, yeah.  


- Simone Aiken







Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Simone Aiken


On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera alvhe...@commandprompt.com
wrote:
 Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:

 Hello Postgres Hackers,

 In reference to this todo item about clustering system table indexes, 
 ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) 

 Wow, this is really old stuff.  I don't know if this is really of any 

If we can't, that might mean it's time to remove this TODO.

When I'm learning a new system I like to first learn how to use it,
second learn its data model, third start seriously looking at the code.
So that Todo is ideal for my learning method.  

If there is something else that would also involve studying all the system
tables it would also be great.  For example, I noticed we have column 
level comments on the web but not in the database itself.  This seems
silly.  Why not have the comments in the database and have the web
query the tables of template databases for the given versions?

That way \d+ pg_tablename would provide instant gratification for users.
And we all like our gratification to be instant.  They could be worked into
The .h files though as inserts to pg_description they wouldn't provide an
excuse to learn bison.

I'm open to other suggestions as well.

-Simone Aiken



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Bruce Momjian
Robert Haas wrote:
 On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:
 
  Hello Postgres Hackers,
 
  In reference to this todo item about clustering system table indexes,
  ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php )
  I have been studying the system tables to see which would benefit ?from
  clustering. ?I have some index suggestions and a question if you have a
  moment.
 
  Wow, this is really old stuff. ?I don't know if this is really of any
  benefit, given that these catalogs are loaded into syscaches anyway.
  Furthermore, if you cluster at initdb time, they will soon lose the
  ordering, given that updates move tuples around and inserts put them
  anywhere. ?So you'd need the catalogs to be re-clustered once in a
  while, and I don't see how you'd do that (except by asking the user to
  do it, which doesn't sound so great).
 
 The idea of the TODO seems to have been to set the default clustering
 to something reasonable.  That doesn't necessarily seem like a bad
 idea even if we can't automatically maintain the cluster order, but
 there's some question in my mind whether we'd get any measurable
 benefit from the clustering.  Even on a database with a gigantic
 number of tables, it seems likely that the relevant system catalogs
 will stay fully cached and, as you point out, the system caches will
 further blunt the impact of any work in this area.  I think the first
 thing to do would be to try to come up with a reproducible test case
 where clustering the tables improves performance.  If we can't, that
 might mean it's time to remove this TODO.

I think CLUSTER is a win when you are looking up multiple rows in the
same table, either using a non-unique index or a range search.  What
places do such lookups?  Having them all in adjacent pages would be a
win --- single-row lookups are usually not.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 12:16 PM, Simone Aiken sai...@ulfheim.net wrote:
 When I'm learning a new system I like to first learn how to use it,
 second learn its data model, third start seriously looking at the code.
 So that Todo is ideal for my learning method.

Sure - my point is just that we usually have as a criteria for any
performance related patch that it actually does improve performance.
So, we'd need a test case.

 If there is something else that would also involve studying all the system
 tables it would also be great.  For example, I noticed we have column
 level comments on the web but not in the database itself.  This seems
 silly.  Why not have the comments in the database and have the web
 query the tables of template databases for the given versions?

Uh... I don't know what this means.

 I'm open to other suggestions as well.

Here are a few TODO items that look relatively easy to me (they may
not actually be easy when you dig in, of course):

Clear table counters on TRUNCATE
Allow the clearing of cluster-level statistics
Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME
Allow ALTER TABLE to change constraint deferrability and actions

Unfortunately we don't have a lot of easy TODOs.  People keep doing
the ones we think up...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-18 Thread Simone Aiken

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Tuesday, January 18, 2011 2:53 PM
To: Simone Aiken
Cc: Alvaro Herrera; pgsql-hackers
Subject: Re: [HACKERS] ToDo List Item - System Table Index Clustering


Sure - my point is just that we usually have as a criteria for any
performance related patch that it actually does improve performance.


Sorry wasn't arguing your point.   Conceding it actually. =)  
I wasn't explaining why I chose it anyway to contest your statements,
but as an invitation for you to point me towards something more useful 
that fit what I was looking for in a task. 



 Uh... I don't know what this means.


Pages like this one have column comments for the system tables:

http://www.psql.it/manuale/8.3/catalog-pg-attribute.html

But in my database when I look for comments they aren't there:

qcc= \d+ pg_attribute
  Table pg_catalog.pg_attribute
Column |   Type   | Modifiers | Description
---+--+---+-
 attrelid  | oid  | not null  |
 attname   | name | not null  |
 atttypid  | oid  | not null  |
 attstattarget | integer  | not null  |
 attlen| smallint | not null  |
 attnum| smallint | not null  |
 attndims  | integer  | not null  |
 attcacheoff   | integer  | not null  |
 atttypmod | integer  | not null  |
 attbyval  | boolean  | not null  |
 attstorage| char   | not null  |
 attalign  | char   | not null  |
 attnotnull| boolean  | not null  |
 atthasdef | boolean  | not null  |
 attisdropped  | boolean  | not null  |
 attislocal| boolean  | not null  |
 attinhcount   | integer  | not null  |


So I have to fire up a web browser and start googling to learn 
about the columns.  Putting them in pg_description would be 
more handy, no?


-Simone Aiken



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-17 Thread Simone Aiken

Followup on System Table Index clustering ToDo -

It looks like to implement this I need to do the following:

1 - Add statements to indexing.h to cluster the selected indexes.
A do-nothing define at the top to suppress warnings and then
lines below for perl to parse out.

#define DECLARE_CLUSTER_INDEX(table,index) ...
( add the defines under the index declarations ).

2 - Alter genbki.pl to produce the appropriate statements in 
postgres.bki when it reads the new lines in indexing.h.
Will hold them in memory until the end of the file so they
will come in after 'Build Indices' is called.

CLUSTER tablename USING indexname

3 - Initdb will pipe the commands in postgres.bki to the
postgres executable running in --boot mode. Code
will need to be added to bootparse.y to recognize
this new command and resolve it into a call to

cluster_rel( tabOID, indOID, 0, 0, -1, -1 );


Speak now before I learn Bison ... actually I should probably
learn Bison anyway.  After ProC other pre-compilation languages
can't be that bad.

Sound all right?

Thanks,

-Simone Aiken


On Jan 15, 2011, at 10:11 PM, Simone Aiken wrote:

 
 Hello Postgres Hackers,
 
 In reference to this todo item about clustering system table indexes, 
   
 ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) 
 I have been studying the system tables to see which would benefit  from 
 clustering.  I have some index suggestions and a question if you have a 
 moment.
 
 Cluster Candidates:
 
   pg_attribute:  Make the existing index ( attrelid, attnum ) clustered 
 to 
   order it by table and column.
   
   pg_attrdef:  Existing index ( adrelid, adnum ) clustered to order it
   by table and column.
 
   pg_constraint:  Existing index ( conrelid ) clustered to get table 
   constraints contiguous.
 
   pg_depend: Existing Index (refclassid, refobjid, refobjsubid) clustered
   to so that when the referenced object is changed its dependencies 
   arevcontiguous.
 
   pg_description: Make the existing index ( Objoid, classoid, objsubid ) 
   clustered to order it by entity, catalog, and optional column.  
   * reversing the first two columns makes more sense to me ... 
   catalog, object, column or since object implies catalog ( 
 right? ) 
   just dispensing with catalog altogether, but that would mean 
   creating a new index.
   
   pg_shdependent: Existing index (refclassid, refobjid) clustered for 
   same reason as pg_depend.
 
   pg_statistic: Existing index (starelid, staattnum) clustered to order 
   it by table and column.
 
   pg_trigger:  Make the existing index ( tgrelid, tgname ) clustered to 
   order it by table then name getting all the triggers on a table 
 together.
 
 Maybe Cluster:
 
   pg_rewrite: Not sure about this one ... The existing index ( ev_class,
   rulename ) seems logical to cluster to get all the rewrite rules for a
   given table contiguous but in the db's available to me virtually every
   table only has one rewrite rule.  
 
   pg_auth_members:  We could order it by role or by member of
   that role.  Not sure which would be more valuable.
 
 
 Stupid newbie question:
 
 
   is there a way to make queries on the system tables show me what 
   is actually there when I'm poking around?  So for example:
 
   Select * from pg_type limit 1;
 
   tells me that the typoutput is 'boolout'.  An english string rather 
 than 
   a number.  So even though the documentation says that column
   maps to pg_proc.oid I can't then write:
 
   Select * from pg_proc where oid = 'boolout';
 
   It would be very helpful if I wasn't learning the system but since I
   am I'd like to turn it off for now.  Fewer layers of abstraction.
 
 
 Thanks,
 
 Simone Aiken
 
 303-956-7188
 Quietly Competent Consulting
 
 
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] ToDo List Item - System Table Index Clustering

2011-01-16 Thread Simone Aiken

Hello Postgres Hackers,

In reference to this todo item about clustering system table indexes,   
( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php ) 
I have been studying the system tables to see which would benefit  from 
clustering.  I have some index suggestions and a question if you have a 
moment.

Cluster Candidates:

pg_attribute:  Make the existing index ( attrelid, attnum ) clustered 
to 
order it by table and column.

pg_attrdef:  Existing index ( adrelid, adnum ) clustered to order it
by table and column.

pg_constraint:  Existing index ( conrelid ) clustered to get table 
constraints contiguous.

pg_depend: Existing Index (refclassid, refobjid, refobjsubid) clustered
to so that when the referenced object is changed its dependencies 
arevcontiguous.

pg_description: Make the existing index ( Objoid, classoid, objsubid ) 
clustered to order it by entity, catalog, and optional column.  
* reversing the first two columns makes more sense to me ... 
catalog, object, column or since object implies catalog ( 
right? ) 
just dispensing with catalog altogether, but that would mean 
creating a new index.

pg_shdependent: Existing index (refclassid, refobjid) clustered for 
same reason as pg_depend.

pg_statistic: Existing index (starelid, staattnum) clustered to order 
it by table and column.

pg_trigger:  Make the existing index ( tgrelid, tgname ) clustered to 
order it by table then name getting all the triggers on a table 
together.

Maybe Cluster:

pg_rewrite: Not sure about this one ... The existing index ( ev_class,
rulename ) seems logical to cluster to get all the rewrite rules for a
given table contiguous but in the db's available to me virtually every
table only has one rewrite rule.  

pg_auth_members:  We could order it by role or by member of
that role.  Not sure which would be more valuable.


Stupid newbie question:


is there a way to make queries on the system tables show me what 
is actually there when I'm poking around?  So for example:

Select * from pg_type limit 1;

tells me that the typoutput is 'boolout'.  An english string rather 
than 
a number.  So even though the documentation says that column
maps to pg_proc.oid I can't then write:

Select * from pg_proc where oid = 'boolout';

It would be very helpful if I wasn't learning the system but since I
am I'd like to turn it off for now.  Fewer layers of abstraction.


Thanks,

Simone Aiken

303-956-7188
Quietly Competent Consulting





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-16 Thread Nicolas Barbier
2011/1/16 Simone Aiken sai...@ulfheim.net:

        is there a way to make queries on the system tables show me what
        is actually there when I'm poking around?  So for example:

                Select * from pg_type limit 1;

        tells me that the typoutput is 'boolout'.  An english string rather 
 than
        a number.  So even though the documentation says that column
        maps to pg_proc.oid I can't then write:

                Select * from pg_proc where oid = 'boolout';

Type type of typoutput is regproc, which is really an oid with a
different output function. To get the numeric value, do:

Select typoutput::oid from pg_type limit 1;

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-16 Thread Tom Lane
Nicolas Barbier nicolas.barb...@gmail.com writes:
 2011/1/16 Simone Aiken sai...@ulfheim.net:
... So even though the documentation says that column
maps to pg_proc.oid I can't then write:
Select * from pg_proc where oid = 'boolout';

 Type type of typoutput is regproc, which is really an oid with a
 different output function. To get the numeric value, do:
 Select typoutput::oid from pg_type limit 1;

Also, you *can* go back the other way.  It's very common to write

   Select * from pg_proc where oid = 'boolout'::regproc

rather than looking up the OID first.  There are similar pseudotypes for
relation and operator names; see Object Identifier Types in the
manual.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-16 Thread Simone Aiken


 Select typoutput::oid from pg_type limit 1;


 Also, you *can* go back the other way.  It's very common to write
 
   Select * from pg_proc where oid = 'boolout'::regproc
 
 rather than looking up the OID first.  


  see Object Identifier Types in the manual.


Many thanks to you both, that helps tremendously.   

- Simone Aiken



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers