On 2 Aug 2010, at 19:23, James Cloos wrote: >>>>>> "AH" == Alban Hertroys <dal...@solfertje.student.utwente.nl> writes: > > AH> I don't think the reindex is necessary at all. That's a corner-case > AH> operation that you shouldn't need in normal database operation. I > AH> don't see why dspam would need it, at best there'd be some > AH> high-volume scenarios where this turns out to be a gain, but for > AH> those cases you need custom solutions anyway. > > High-volume is dependent on the size of the box. > > At least up through pg8 index entries are not re-used. They accumulate, > increasing the size of the index files to the point that they can become > too large for the shared_buffers. When that occurs, performace dies.
The truth is more complicated than that. See: http://www.postgresql.org/docs/8.4/interactive/routine-reindex.html And to some extent: http://www.postgresql.org/docs/8.4/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS > For any temporal data set -- and dspam certainly qualifies -- regular > use of reindex is the recomendation one gets from the pg lists. They > will say that vacuum full is needed in only the most extremely temporal > cases, but that regular reindexing is vital. I don't think so actually. Dspam tends to eventually delete everything older than a certain age if I understand the algorithms correctly, so those pages of the index will be empty after that age and they will be deleted. Not much index bloat there; does that warrant putting an exclusive lock on the index while reindexing - which may take some time? Of course, for versions before PG 7.4 this doesn't hold true, but people really shouldn't be using those versions anyway. > How often, though, should vary based on the size of the tables, the size > of the purge and the size of the box. And it probably should do vacuum, > reindex, analyze rather than the current vacuum, analyze, reindex. VACUUM ANALYSE (as one command it's only one parse of the tables) for the tables that see much change should suffice. It marks rows that got deleted (and an update is an insert+delete) for reclaiming and updates the table statistics so that the planner can make better choices. On very large dspam installations it could be beneficial to run REINDEX every now and then, but as part of the nightly purge script is probably too frequent. Of course most of this is conjecture without hard numbers to back this up. In my experience people who're experiencing performance problems often tune the wrong part of the problem, without even looking at output from, for example, EXPLAIN ANALYSE. That's like buying a new couch and then "fixing" the door frame because it doesn't fit through. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:1126,4c570998286212493918795! ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user