Thomas,

> Mark Martinec schrieb:
> > I can prepare a small change to amavisd which would use
> > (partition_tag,mail_id) as a unique key instead of just the (mail_id).
> > ...
> > Do you think it would help?
>
> That would be great! (and probably also speed up amavisd's lookups
> without the need to add an additional key on mail_id)

Ok, here it goes. The patch below changes SQL clauses to always join
tables using both the partition_tag and the mail_id fields. This has
no particular effect on existing 2.6.0 databases where a primary key
is mail_id (it is just a redundant condition), but saves a day
when a primary key is a composite (partition_tag,mail_id).

Actually amavisd does not know and need not be aware of what is
a primary key. The logic goes like this: when a mail_id for a new
message being processed is generated, amavisd tries to INSERT
a record with this randomly generated mail_id into table msgs
(using $sql_clause{'ins_msg'} SQL clause). If the operation fails,
another mail_id is generated and attempt repeated, until it eventually
succeeds. Thus it depends entirely on SQL's decision whether a
particular record is allowed or would break some UNIQUE constraint.
So, by only changing a table declaration (PRIMARY KEY or adding
a CONSTRAINT), it changes what keys amavisd will be allowed to insert.

Changing a primary key to (partition_tag,mail_id) brings consqeuences
to quarantining, in particular to releasing from a SQL quarantine,
where it no longer suffices to specify mail_id=xxx in AM.PDP request,
but may be necessary to specify also a partition_tag=xx to distinquish
between SQL-quarantined messages which happen to have the same mail_id.

For compatibility I added code to amavisd (using a new 'sel_msg' entry
in %sql_clause) where in case of releasing from SQL quarantine and a
release request is missing a partition_tag=xx attribute, amavisd tries
to find a missing partition_tag value by querying msgs table (using
only mail_id as a key). If exactly one record matches, then everything
is fine, and releasing may proceed. If multiple records with the same
mail_id exists the release request is aborted, with a message asking
user to supply the disambiguating partition_tag=xx attribute.

Quarantine id for a SQL-quarantined message (as logged in main
log entry) is changed from:
  quarantine: aX3C4f6btXgX
  quarantine: aX3C4f6btXgX[21]
i.e. has a partition_tag in brackets appended to mail_id.
Correspondingly the amavisd-release is also changed to be
able to parse 'aX3C4f6btXgX[21]', splitting it into mail_id
and partition_tag and providing each as a separate attribute
in an AM.PDP request.

For a classical file-based quarantining a template %P may be
used when forming a file name, which will be replaced by
a current partition_tag value. This allows for a file-based
quarantine for example to be split into weekly subdirectories.
For example:
  cd /var/virusmails
  mkdir -p P21/spam P22/spam P23/spam
  chown -R vscan:vscan P21 P22 P23
and having:
  $spam_quarantine_method = 'local:P%P/spam/%m.gz';
  $sql_partition_tag =
    sub { my($msginfo)[EMAIL PROTECTED]; iso8601_week($msginfo->rx_time) };
would store spam into per- partition_tag (e.g. per-week) subdirectories.
There is one detail left to be resolved for releasing a message
from such a file-based schema (probably a change in amavisd-release).
Will think about it later on.

I hope this could contribute some progress regarding SQL partitioning
problems. I haven't tried it yet myself on a partitioned db, I'm just
running a normal non-partitioned database under PostgreSQL currently.


--- amavisd-release.orig        2008-03-17 15:43:33.000000000 +0100
+++ amavisd-release     2008-05-22 18:44:50.000000000 +0200
@@ -146,5 +146,6 @@
 sub release_file($$$@) {
   my($sock,$mail_file,$secret_id,@alt_recips) = @_;
-  my($fn_path,$fn_prefix,$mail_id,$fn_suffix); local($1,$2,$3,$4);
+  my($fn_path,$fn_prefix,$mail_id,$fn_suffix,$part_tag); local($1,$2,$3,$4);
+  $part_tag = $1  if $mail_file =~ s/ \[ ( [^\]]* ) \] \z//xs;
   if ($mail_file =~ m{^ ([^/].*/)? ([A-Z0-9][A-Z0-9._-]*[_-])?
                         ([A-Z0-9][A-Z0-9_+-]{10}[A-Z0-9]) (\.gz)? \z}xsi) {
@@ -165,6 +166,7 @@
     "mail_id=$mail_id",
   );
-  push(@query, "secret_id=$secret_id")  if $secret_id ne '';
-  push(@query, "mail_file=$mail_file")  if $quar_type =~ /^[FZB]\z/;
+  push(@query, "secret_id=$secret_id")    if $secret_id ne '';
+  push(@query, "mail_file=$mail_file")    if $quar_type =~ /^[FZB]\z/;
+  push(@query, "partition_tag=$part_tag") if $part_tag ne '';
   if (@alt_recips) {  # override original recipients if list is nonempty
     push(@query, map {"recipient=$_"} @alt_recips);
--- amavisd.orig        2008-04-23 20:50:05.000000000 +0200
+++ amavisd     2008-05-22 18:44:50.000000000 +0200
@@ -1089,5 +1089,5 @@
       'UPDATE msgs SET content=?, quar_type=?, quar_loc=?, dsn_sent=?,'.
       ' spam_level=?, message_id=?, from_addr=?, subject=?, client_addr=?'.
-      ' WHERE mail_id=?',
+      ' WHERE partition_tag=? AND mail_id=?',
     'ins_rcp' =>
       'INSERT INTO msgrcpt (partition_tag, mail_id, rid,'.
@@ -1096,14 +1096,17 @@
       'INSERT INTO quarantine (partition_tag, mail_id, chunk_ind, mail_text)'.
       ' VALUES (?,?,?,?)',
+    'sel_msg' =>  # obtains partition_tag if missing in a release request 
+      'SELECT partition_tag FROM msgs WHERE mail_id=?',
     'sel_quar' =>
-      'SELECT mail_text FROM quarantine WHERE mail_id=? ORDER BY chunk_ind',
+      'SELECT mail_text FROM quarantine WHERE partition_tag=? AND mail_id=?'.
+      ' ORDER BY chunk_ind',
     'sel_penpals' =>  # no message-id references list
       "SELECT msgs.time_num, msgs.mail_id, subject".
-      " FROM msgs JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id".
+      " FROM msgs JOIN msgrcpt USING (partition_tag,mail_id)".
       " WHERE sid=? AND rid=? AND content!='V' AND ds='P'".
       " ORDER BY msgs.time_num DESC LIMIT 1",
     'sel_penpals_msgid' =>  # with a nonempty message-id references list
       "SELECT msgs.time_num, msgs.mail_id, subject, message_id, rid".
-      " FROM msgs JOIN msgrcpt ON msgs.mail_id=msgrcpt.mail_id".
+      " FROM msgs JOIN msgrcpt USING (partition_tag,mail_id)".
       " WHERE sid=? AND content!='V' AND ds='P' AND message_id IN (%m)".
         " AND rid!=sid".
@@ -9326,4 +9329,5 @@
   my($partition_tag) = c('sql_partition_tag');
   $partition_tag = &$partition_tag($msginfo)  if ref $partition_tag eq 'CODE';
+  $partition_tag = 0  if !defined($partition_tag);
   $msginfo->partition_tag($partition_tag);
 
@@ -14442,4 +14446,5 @@
   #   mail_id=xxxxxxxxxxxx
   #   secret_id=xxxxxxxxxxxx              (authorizes a release/report)
+  #   partition_tag=xx                    (only required if mail_id not unique)
   #   quar_type=x                         F/Z/B/Q/M  (defaults to Q or F)
   #                                       file/zipfile/bsmtp/sql/mailbox
@@ -14514,4 +14519,5 @@
   elsif ($attr_ref->{'request'} =~ /^(?:release|requeue|report)\z/i) {
     exists $attr_ref->{'mail_id'} or die "Missing 'mail_id' field";
+    $msginfo->partition_tag($attr_ref->{'partition_tag'});  # may be undef
     my($fn) = $attr_ref->{'mail_id'};
     # amavisd almost-base64: 62 +, 63 -
@@ -14562,7 +14568,27 @@
         or die "SQL quarantine code not enabled";
       my($conn_h) = $obj->{conn_h}; my($sql_cl_r) = cr('sql_clause');
+      my($sel_msg)  = $sql_cl_r->{'sel_msg'};
+      my($sel_quar) = $sql_cl_r->{'sel_quar'};
+      if (!defined($msginfo->partition_tag) &&
+          defined($sel_msg) && $sel_msg ne '') {
+        do_log(5, "preprocess_policy_query: missing partition_tag in request,".
+                  " fetching msgs record for mail_id=%s", $msginfo->mail_id);
+        # find a corresponding partition_tag if missing from a release request
+        $conn_h->begin_work_nontransaction;  #(re)connect if necessary
+        $conn_h->execute($sel_msg, untaint($msginfo->mail_id));
+        my($a_ref); my($cnt) = 0;
+        while ( defined($a_ref=$conn_h->fetchrow_arrayref($sel_msg)) ) {
+          $cnt++;  $msginfo->partition_tag($a_ref->[0]) if $cnt==1;
+          ll(5) && do_log(5, "got msgs record for mail_id=%s: %s",
+                             $msginfo->mail_id, join(', ',@$a_ref));
+        }
+        $conn_h->finish($sel_msg)  if defined $a_ref;  # only if not all read
+        $cnt <= 1 or die "Multiple ($cnt) records with same mail_id exists, ".
+                         "specify a partition_tag in AM.PDP request";
+      }
       $conn_h->begin_work_nontransaction;  # (re)connect if not connected
       $fh = Amavis::IO::SQL->new;
-      $fh->open($conn_h,$sql_cl_r->{'sel_quar'},untaint($msginfo->mail_id))
+      $fh->open($conn_h, $sel_quar, untaint($msginfo->mail_id),
+                'r', untaint($msginfo->partition_tag))
         or die "Can't open sql obj for reading: $!";
     } else {
@@ -17265,4 +17291,5 @@
   $bsmtp_file_final =~ s{%(.)}
     {  $1 eq 'b' ? $msginfo->body_digest
+     : $1 eq 'P' ? $msginfo->partition_tag
      : $1 eq 'm' ? $msginfo->mail_id
      : $1 eq 'n' ? $msginfo->log_id
@@ -17495,4 +17522,5 @@
         $suggested_filename =~ s{%(.)}
           {  $1 eq 'b' ? $msginfo->body_digest
+           : $1 eq 'P' ? $msginfo->partition_tag
            : $1 eq 'm' ? $msginfo->mail_id
            : $1 eq 'n' ? $msginfo->log_id
@@ -18425,6 +18453,6 @@
                0+untaint($spam_level), $m_id, $from, $subj,
                untaint($msginfo->client_addr), #maybe we have a better info now
-               # $os_fp,
-               $mail_id);            # $rfc2822_sender, $rfc2822_from,
+               $msginfo->partition_tag, $mail_id);
+               # $os_fp, $rfc2822_sender, $rfc2822_from,
       # SQL_CHAR, SQL_VARCHAR, SQL_VARBINARY, SQL_BLOB, SQL_INTEGER, SQL_FLOAT,
       # SQL_TIMESTAMP, SQL_TYPE_TIMESTAMP_WITH_TIMEZONE, ...
@@ -18489,14 +18517,17 @@
     or do { $@ = "errno=$!"  if $@ eq '';  chomp $@ };
   die $@  if $@ =~ /^timed out\b/;  # resignal timeout
-  if ($self->{mode} eq 'w') {
+  if ($self->{mode} eq 'w') {  # open for write access
     ll(4) && do_log(4,"Amavis::IO::SQL::open %s drv=%s (%s); key=%s, p_tag=%s",
                     $self->{mode}, $driver, $self->{clause},
                     $self->{dbkey}, $self->{partition_tag});
-  } else {
-    eval { $conn_h->execute($self->{clause}, $self->{dbkey});  1 }
-      or do { $@ = "errno=$!"  if $@ eq '' };
+  } else {  # open for read access
+    eval {
+      $conn_h->execute($self->{clause}, $self->{partition_tag},$self->{dbkey});
+      1;
+    } or do { $@ = "errno=$!"  if $@ eq '' };
     my($ll) = $@ ne '' ? -1 : 4;
-    ll($ll) && do_log($ll,"Amavis::IO::SQL::open %s drv=%s (%s); key=%s: %s",
-                  $self->{mode}, $driver, $self->{clause}, $self->{dbkey}, $@);
+    do_log($ll,"Amavis::IO::SQL::open %s drv=%s (%s); key=%s, p_tag=%s: %s",
+               $self->{mode}, $driver, $self->{clause},
+               $self->{dbkey}, $self->{partition_tag}, $@)  if ll($ll);
     if ($@ ne '') {
       chomp $@;
@@ -18827,5 +18858,11 @@
     next  if $r->recip_done;
     $r->recip_smtp_response($smtp_response); $r->recip_done(2);
-    $r->recip_mbxname($mail_id)  if $smtp_response =~ /^2/;
+    if ($smtp_response =~ /^2/) {
+      my($mbxname) = $mail_id;
+      my($p_tag) = $msginfo->partition_tag;
+      $mbxname .= '[' . $p_tag . ']'
+        if defined($p_tag) && $p_tag ne '' && $p_tag ne '0';
+      $r->recip_mbxname($mbxname);
+    }
   }
   section_time('fwd-sql');




Mark

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
AMaViS-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to