Re: [GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.

2014-01-16 Thread Jeff Janes
On Thu, Jan 9, 2014 at 2:03 PM, Jeff Amiel becauseimj...@yahoo.com wrote:

 I have a maintenance window coming up and using pg_upgrade to upgrade from
 9.2.X to 9.3.X.
 As part of the window, I’d like to ‘cluster’ each table by its primary
 key.  After doing so, I see amazing performance improvements (probably
 mostly because of index bloat - but possibly due to table fragmentation)


Since you seem to have a test environment where you can so such things, you
can try first reindexing, and then clustering, to so which step is the
important one to get the performance improvement.  If it is the reindexing
that is really the key, you could get that benefit outside of the
maintenance window by building a new index with the same columns
concurrently, then dropping the old one, doing it one index at a time.  If
the cluster is the key part, that is hard to move outside a maintenance
window.



 That being said, I have a single table that is blowing my window -
 at 140 million rows (28 gig in size with 75 gig worth of indexes), this
 bad boy is my white whale. There are 10 indexes (not including the primary
 key).  Yes - 10 is a lot - but I’ve been monitoring their use (most are
 single column or partial indexes) and all are used.

 That being said, I’ve been reading and experimenting in trying to get a
 cluster of this table (which re-indexes all 10/11 indexes) to complete in a
 reasonable amount of time.

 There are lots of settings and ranges to chose from and while my
 experiments continue, I was looking to get some input.  Lowest I have
 gotten for clustering this table is just under 6 hours.

 I am familiar with pg_reorg and it’s sibling pg_repack - but they call the
 base postgresql reindex functions underneath - and I have learned by using
 ‘verbose’ that the actual clustering of the table is quick - it’s the
 reindexing that is slow (It’s doing each reindex sequentially instead of
 concurently)

 PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2,
 64-bit
 500 gig of ram
 2.7gig processors (48 cores)
 Shared buffers set to 120gig
 Maintenance work men set to 1gig
 work men set to 500 meg

 Things I have read/seen/been told to tweak…

 fsync (set to off)


I see little point in that.  You are putting your data at serious risk, and
bulk index creation shouldn't be doing much fsyncing anyway.


 setting wal_level to minimal (to avoid wal logging of cluster activity)


That can be a big win in some cases, if it is compatible with your backup
policy.  But I don't know if cluster is actually one of those cases.


 bumping up maintenance work men (but I’ve also seen/read that uber high
 values cause disk based sorts which ultimately slow things down)


I don't understand that advice.  If it is big enough, you can *avoid* disk
bases sorts by doing them in RAM, which should be faster, provided you
actually have enough RAM so that you don't swap.  However, there are a
variety of things which limit how much memory a sort could actually use
(most of which will be removed in 9.4) so you might not get in-RAM sorts no
matter how much you crank up maintenance_work_mem.  You can turn on
trace_sort to see whether your sorts are in RAM or on disk, and how much
memory they are using.



 Tweaking checkpoint settings (although with wal_level set to minimal - I
 don’t think it comes into play)


Measure it--how often does it cycle through checkpoints during your test
CLUSTER?



 any good suggestions for lighting a fire under this process?

 If worse comes to worse, I can vacuum full the table and reindex each
 index concurrently -   but it won’t give me the benefit of having the
 tuples ordered by their oft-grouped primary key.


If you vacuum full, it will rebuild the indexes for you automatically, so
after that there is no point in doing a manual reindex. You could drop them
manually before hand, and then build them manually afterward, but you can
do that whether the central activity is a CLUSTER or a VACUUM FULL.  If you
want to do this in parallel, then you need to keep in mind that
maintenance_work_mem needs to be small enough to fit all of the concurrent
builds, and your IO system needs to accomodate all of that traffic.

Cheers,

Jeff


Re: [GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.

2014-01-09 Thread Jeff Amiel




On Thursday, January 9, 2014 4:03 PM, Jeff Amiel becauseimj...@yahoo.com 
wrote:
I am familiar with pg_reorg and it’s sibling pg_repack - but they call the base 
postgresql reindex functions underneath - and I have learned by using ‘verbose’ 
that the actual clustering of the table is quick - it’s the reindexing that is 
slow (It’s doing each reindex sequentially instead of concurently)


I'm second guessing this - I'm not sure I REALLY know how long the table 
cluster takes versus the index - 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Looking for settings/configuration for FASTEST reindex on idle system.

2014-01-09 Thread Sergey Konoplev
On Thu, Jan 9, 2014 at 2:03 PM, Jeff Amiel becauseimj...@yahoo.com wrote:
 I have a maintenance window coming up and using pg_upgrade to upgrade from 
 9.2.X to 9.3.X.
 As part of the window, I’d like to ‘cluster’ each table by its primary key.  
 After doing so, I see amazing performance improvements (probably mostly 
 because of index bloat - but possibly due to table fragmentation)

[...]

 fsync (set to off)
 setting wal_level to minimal (to avoid wal logging of cluster activity)
 bumping up maintenance work men (but I’ve also seen/read that uber high 
 values cause disk based sorts which ultimately slow things down)
 Tweaking checkpoint settings (although with wal_level set to minimal - I 
 don’t think it comes into play)

 any good suggestions for lighting a fire under this process?

Another idea is to drop all indexes except ones you are going to
cluster tables by, cluster the tables and then restore the dropped
indexes. Use xargs with -P to perform things in parallel, for example

cat indexdefs.sql | xargs -d '\n' -I {} -P 5 psql -e dbname -c '{}'

will start creating indexes from indexdefs.sql (assuming one index
definition in one line) by 5 in parallel. You can also cluster all
your tables a similar way by several tables in parallel.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general