On 2 Aug 2010, at 19:23, James Cloos wrote:

>>>>>> "AH" == Alban Hertroys <dal...@solfertje.student.utwente.nl> writes:
> 
> AH> I don't think the reindex is necessary at all. That's a corner-case
> AH> operation that you shouldn't need in normal database operation. I
> AH> don't see why dspam would need it, at best there'd be some
> AH> high-volume scenarios where this turns out to be a gain, but for
> AH> those cases you need custom solutions anyway.
> 
> High-volume is dependent on the size of the box.
> 
> At least up through pg8 index entries are not re-used.  They accumulate,
> increasing the size of the index files to the point that they can become
> too large for the shared_buffers.  When that occurs, performace dies.

The truth is more complicated than that.

See:
http://www.postgresql.org/docs/8.4/interactive/routine-reindex.html

And to some extent:
http://www.postgresql.org/docs/8.4/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS

> For any temporal data set -- and dspam certainly qualifies -- regular
> use of reindex is the recomendation one gets from the pg lists.  They
> will say that vacuum full is needed in only the most extremely temporal
> cases, but that regular reindexing is vital.

I don't think so actually.

Dspam tends to eventually delete everything older than a certain age if I 
understand the algorithms correctly, so those pages of the index will be empty 
after that age and they will be deleted. Not much index bloat there; does that 
warrant putting an exclusive lock on the index while reindexing - which may 
take some time?

Of course, for versions before PG 7.4 this doesn't hold true, but people really 
shouldn't be using those versions anyway.

> How often, though, should vary based on the size of the tables, the size
> of the purge and the size of the box.  And it probably should do vacuum,
> reindex, analyze rather than the current vacuum, analyze, reindex.


VACUUM ANALYSE (as one command it's only one parse of the tables) for the 
tables that see much change should suffice. It marks rows that got deleted (and 
an update is an insert+delete) for reclaiming and updates the table statistics 
so that the planner can make better choices.

On very large dspam installations it could be beneficial to run REINDEX every 
now and then, but as part of the nightly purge script is probably too frequent.

Of course most of this is conjecture without hard numbers to back this up. In 
my experience people who're experiencing performance problems often tune the 
wrong part of the problem, without even looking at output from, for example, 
EXPLAIN ANALYSE. That's like buying a new couch and then "fixing" the door 
frame because it doesn't fit through.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:1126,4c570998286212493918795!



------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to