I have a maintenance window coming up and using pg_upgrade to upgrade from 
9.2.X to 9.3.X.
As part of the window, I’d like to ‘cluster’ each table by its primary key.  
After doing so, I see amazing performance improvements (probably mostly because 
of index bloat - but possibly due to table fragmentation)

That being said, I have a single table that is blowing my window - 
at 140 million rows (28 gig in size with 75 gig worth of indexes), this bad boy 
is my white whale. There are 10 indexes (not including the primary key).  Yes - 
10 is a lot - but I’ve been monitoring their use (most are single column or 
partial indexes) and all are used.

That being said, I’ve been reading and experimenting in trying to get a cluster 
of this table (which re-indexes all 10/11 indexes) to complete in a reasonable 
amount of time.

There are lots of settings and ranges to chose from and while my experiments 
continue, I was looking to get some input.  Lowest I have gotten for clustering 
this table is just under 6 hours.  

I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base 
postgresql reindex functions underneath - and I have learned by using ‘verbose’ 
that the actual clustering of the table is quick - it’s the reindexing that is 
slow (It’s doing each reindex sequentially instead of concurently)

PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2, 64-bit
500 gig of ram
2.7gig processors (48 cores)
Shared buffers set to 120gig
Maintenance work men set to 1gig
work men set to 500 meg

Things I have read/seen/been told to tweak…

fsync (set to off)
setting wal_level to minimal (to avoid wal logging of cluster activity)
bumping up maintenance work men (but I’ve also seen/read that uber high values 
cause disk based sorts which ultimately slow things down)
Tweaking checkpoint settings (although with wal_level set to minimal - I don’t 
think it comes into play)

any good suggestions for lighting a fire under this process?

If worse comes to worse, I can vacuum full the table and reindex each index 
concurrently -   but it won’t give me the benefit of having the tuples ordered 
by their oft-grouped primary key.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to