On Thu, Jan 9, 2014 at 2:03 PM, Jeff Amiel <becauseimj...@yahoo.com> wrote:

> I have a maintenance window coming up and using pg_upgrade to upgrade from
> 9.2.X to 9.3.X.
> As part of the window, I’d like to ‘cluster’ each table by its primary
> key.  After doing so, I see amazing performance improvements (probably
> mostly because of index bloat - but possibly due to table fragmentation)
>

Since you seem to have a test environment where you can so such things, you
can try first reindexing, and then clustering, to so which step is the
important one to get the performance improvement.  If it is the reindexing
that is really the key, you could get that benefit outside of the
maintenance window by building a new index with the same columns
concurrently, then dropping the old one, doing it one index at a time.  If
the cluster is the key part, that is hard to move outside a maintenance
window.



> That being said, I have a single table that is blowing my window -
> at 140 million rows (28 gig in size with 75 gig worth of indexes), this
> bad boy is my white whale. There are 10 indexes (not including the primary
> key).  Yes - 10 is a lot - but I’ve been monitoring their use (most are
> single column or partial indexes) and all are used.
>
> That being said, I’ve been reading and experimenting in trying to get a
> cluster of this table (which re-indexes all 10/11 indexes) to complete in a
> reasonable amount of time.
>
> There are lots of settings and ranges to chose from and while my
> experiments continue, I was looking to get some input.  Lowest I have
> gotten for clustering this table is just under 6 hours.
>
> I am familiar with pg_reorg and it’s sibling pg_repack - but they call the
> base postgresql reindex functions underneath - and I have learned by using
> ‘verbose’ that the actual clustering of the table is quick - it’s the
> reindexing that is slow (It’s doing each reindex sequentially instead of
> concurently)
>
> PostgreSQL 9.3.2 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.5.2,
> 64-bit
> 500 gig of ram
> 2.7gig processors (48 cores)
> Shared buffers set to 120gig
> Maintenance work men set to 1gig
> work men set to 500 meg
>
> Things I have read/seen/been told to tweak…
>
> fsync (set to off)
>

I see little point in that.  You are putting your data at serious risk, and
bulk index creation shouldn't be doing much fsyncing anyway.


> setting wal_level to minimal (to avoid wal logging of cluster activity)
>

That can be a big win in some cases, if it is compatible with your backup
policy.  But I don't know if cluster is actually one of those cases.


> bumping up maintenance work men (but I’ve also seen/read that uber high
> values cause disk based sorts which ultimately slow things down)
>

I don't understand that advice.  If it is big enough, you can *avoid* disk
bases sorts by doing them in RAM, which should be faster, provided you
actually have enough RAM so that you don't swap.  However, there are a
variety of things which limit how much memory a sort could actually use
(most of which will be removed in 9.4) so you might not get in-RAM sorts no
matter how much you crank up maintenance_work_mem.  You can turn on
trace_sort to see whether your sorts are in RAM or on disk, and how much
memory they are using.



> Tweaking checkpoint settings (although with wal_level set to minimal - I
> don’t think it comes into play)
>

Measure it--how often does it cycle through checkpoints during your test
CLUSTER?


>
> any good suggestions for lighting a fire under this process?
>
> If worse comes to worse, I can vacuum full the table and reindex each
> index concurrently -   but it won’t give me the benefit of having the
> tuples ordered by their oft-grouped primary key.
>

If you vacuum full, it will rebuild the indexes for you automatically, so
after that there is no point in doing a manual reindex. You could drop them
manually before hand, and then build them manually afterward, but you can
do that whether the central activity is a CLUSTER or a VACUUM FULL.  If you
want to do this in parallel, then you need to keep in mind that
maintenance_work_mem needs to be small enough to fit all of the concurrent
builds, and your IO system needs to accomodate all of that traffic.

Cheers,

Jeff

Reply via email to