https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7097
Bug ID: 7097
Summary: Set fillfactor for PostgreSQL sample bayes_token and
awl tables
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Learner
Assignee: [email protected]
Reporter: [email protected]
In PostgreSQL database update is essentialy:
- mark old version of a row as deleted;
- add new version of a row.
When there's no empty or deleted space in current 8kB page then it has to write
it to another page. And save both pages to write ahead log (think journal) and
later to data file.
Setting fillfactor table attribute, available from PostgreSQL 8.2, to less than
100 will hint Postgres that this table is updated often and make it use only
set percentage of tuple space for inserted data. Setting this to say 95 will
leave space for a couple of row versions - about 7-8 from 157 in bayes_token,
3-4 from 55 in awl according to my tests.
This would make it more efficient, as large percentage of writes to these
tables are updates (more than 99,5% on my server). It would make it use less
seeks. And would make it use less index writes.
Largest impact would be visible just after importing data - using `sa-learn
--spam`, `sa-learn --ham` or `sa-learn --restore`. It would be less visible
during normal operation, as table would somewhat tune itself by reusing space
marked as deleted automatically. But not as well as using fillfactor. Also it
makes for example benchmarking bayes stores not entirely fair for Postgres.
Please add to bayes_pg.sql:
alter table bayes_token set (fillfactor=95);
And to awl_pg.sql:
alter table awl set (fillfactor=95);
These would generate error on ancient Postgres versions older than 8.2, which
are unsupported by upstream,. But it is a harmless error - it would simply
ignore this statement. As I remember only RHEL/CentOS 5 still support
PostgreSQL 8.1, but even there there's supported and encouraged option to
upgrade to 8.4.
Is there a standard benchmark for bayes stores to measure impact?
--
You are receiving this mail because:
You are the assignee for the bug.