I am looking for comments on this since a final document of some sorts 
needs to be placed in the amavisd-new/docs/readme files to try to 
document what schema changes are needed for mysql partitioning.  Mark 
says we have about 2.6.4 coming up, and maybe this should be documented.
After reading my first draft of this, it all seems so simple, but it 
took a LONG time for me to get this right.

With the advent of mysql 5.1 and amavisd-new support for partition_tags, 
lookups, updates and quarantine cleanup have been greatly improved.

mysql had attempted to assist quarantine cleanup by setting up foreign 
keys, but testing shows that mysql with foreign keys took longer to 
delete/cleanup records than the separate operations.

Some installations have even used mysql for bayes/awl and  postgres for 
the rest since postgres didn't have the performance problems with 
foreign keys.
(Mark:  this will REALLY speed things up, and you can save all that ram 
you used in postgres for disk cache)

This first attempt (draft) will try to document the steps needed to
A) create a new, from scratch mysql/amavisd installation that supports 
partitions
B) give options/ maybe schema scripts that might take an existing 
amavisd 2.6x installation and add partitioning.
  the 'change' one is harder, since you also have to keep something with 
partition_tag 0 in it for a while, and changing primary keys is rough. 
you need to drop the index and add it again.

A lot of suggestions and assistance on this were given by Mark Martinec, 
Thomas Gelf  and Nathan M.

First, a discussion on two different partition schemes:
(see
<http://www.mail-archive.com/amavis-user@lists.sourceforge.net/msg14271.html>

Thomas Gelf:  (linear) and Nathan M's (staggered).

I selected a variation of Nathan M's schema since we allow our users to 
select up to 28 days of quarantine.
(so I needed 5 partitions, not 4 like Nathan's more on that later)
5 partitions are less than 9 as in Thomas's, so you have less files on 
the disk, (9+1*4 vs 5+1*4). and access and cleanup are still 
instantaneous, and it doesn't matter if you change from 7 days to 14 
days to 16 days to 24 days to 28 days while running.  In fact, in our 
service, some clients quarantine viruses for 2 days, attachments for 7 
and spam for 14.

This is what we are currently using
(note the addition of the year, I have a modified iso8601_year() routine 
I put into amavisd.conf to support an iso year/week.  Also note that the 
iso year is NOT the calender year, last or first week of the calender 
year sometimes.) 

CREATE TABLE maddr (
  id         BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  partition_tag integer NOT NULL DEFAULT 0,
  email      varbinary(255),
  domain     varchar(255) NOT NULL,    -- only domain part of the email 
address
                                       -- with subdomain fields in reverse
  PRIMARY KEY (id,partition_tag),
  UNIQUE KEY maddr_idx_tag_email (partition_tag, email)
) ENGINE=InnoDB
PARTITION BY LIST (partition_tag)
(PARTITION p20091 VALUES IN 
(200901,200906,200911,200916,200921,200926,200931,200936,200941,200946,200951),
 PARTITION p20092 VALUES IN 
(200902,200907,200912,200917,200922,200927,200932,200937,200942,200947,200952),
 PARTITION p20093 VALUES IN 
(200903,200908,200913,200918,200923,200928,200933,200938,200943,200948,200953),
 PARTITION p20094 VALUES IN 
(200904,200909,200914,200919,200924,200929,200934,200939,200944,200949),
 PARTITION p20095 VALUES IN 
(200905,200910,200915,200920,200925,200930,200935,200940,200945,200950));

The basic premise is that even with 28 days max quarantine, no single 
partition will ever have more than two weeks of data.
Weekly cleanup is a matter of a cron or sql event entry at 11:45pm on 
Sunday night.  I truncate the partition that contains NEXT WEEKS data.
something like (if NEXT week, in 15 mins.. is in 
200904,200909,200914,200919,200924,200929,200934,200939,200944,200949)

ALTER TABLE maddr DROP PARTITION p20094;
ALTER TABLE maddr ADD PARTITION (PARTITION p20094 VALUES IN (  
200904,200909,200914,200919,200924,200929,200934,200939,200944,200949)

(mysql 5.4 is rumored to have a truncate partition)

(at 11:45pm, that means that if someone was logged on from 11:45 till 
midnight and wanted to get email quarantined 27 days, 23 hours and 45 
mins ago, he is out of luck)
So, in reality, our max quarantine is 27 days 23 hours and 45 mins :-)

if you NEVER allow a client to keep more than 21 days, you only need 4 
partitions, 14 days, 3.

Daily cleanup is really optional, and you really don't need to do 
anything much more than wipe out old msgs records (YMMV), so at 12:01, 
we run a routine that cleans up
1) DHA logs (content = '?')
2) viruses older than client setting virus days
3) old banned attachments
4) old bad headers
5) old spam

(with 2 - 5 having different dates!)

Our old daily cleanup did this and sometimes took a couple hours to do 
them all (content S, V, etc, one at a time)

delete from msgs , quarantine, msgrcpt using msgs
                                natural join quarantine natural join 
msgrcpt
                                where time_num<=$spam_quarantine2 and 
content = 'S'
                                and partition_tag <= '$spam_partition_tag'";

new daily only takes seconds on same server, same load
delete from msgs    where time_num<=$spam_quarantine2 and content = 'S'
                                and partition_tag <= '$spam_partition_tag'";

what about orphaned maddr, msgrcpt, quarantine? don't care.  they are 
gone in less than 6 days, and our reports did joins on 
msgs,maddr,msgrcpt, quarantine, etc.
if the record wasn't send (ie, no sender, if no msgs record), our report 
didn't even try to match it to msgrcpts.

Now, on to create tables, and two problems:
#1,any primary index used has to be part of the partitioning, so 
partition_tag has to be part of it.
#2, any multi-key index also has to use partition_tag as part of it.

new schema needs to be something like this: (note the id,partition_tag 
for primary key), partition_tag, email for unique, second key.
Note that I have partition_tag,mail_id in that order for msgs, msgrcpt 
and quarantine.  also note that you MUST use at LEAST partition tag in 
most of these queries for the optimizer to find your records, so, doing 
a select * from msgs where mail_id='lkjlklkjlj' might take a long time 
if you don't know what date it was created (and thus partition_tag)

Also note, that even with 14 days of data in each partition (as mine 
above) they are separated by several weeks and time_iso, time_num 
indexes solve that last issue.
(and I clean out msgs records daily)


*for maddr:*

  PRIMARY KEY (id,partition_tag)
  UNIQUE KEY maddr_idx_tag_email (partition_tag, email)

*for msgs:*

PRIMARY KEY (partition_tag,mail_id)

*for msgrcpt:*(I don't remember why Its not a primary key, and/or why it 
allows duplicates)

KEY (partition_tag,mail_id)

*for quarantine:*
 PRIMARY KEY (partition_tag,mail_id,chunk_ind)



COMMENTS? Suggestions? corrections?

-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008

_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_________________________________________________________________________
   

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/amavis-user 
 AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 
 AMaViS-HowTos:http://www.amavis.org/howto/ 

Reply via email to