Re: [Mailman-Users] Mailman performance / sends per hour

Brad Knowles Sat, 26 Jul 2003 03:06:44 -0700

At 7:43 PM -0400 2003/07/25, Jon Carnes wrote:

 Actually Brad, it looks like your knowledge of Sendmail is rather dated.
 Sendmail has been doing this since 2001.
http://www.sendmail.org/~ca/email/doc8.12/RELEASE_NOTES

This is old. Check the RELEASE_NOTES for version 8.12.9 (which has a major security fix, and you are advised not to use any older version of 8.12), or 8.12.10.Beta2 (which I quote here and dated Jul 1 05:08). The only references I can find to the word "sort" anywhere in this file with regards to version 8.12 or later are:

8.12.7/8.12.7   2002/12/29
        Do not lookup MX records when sorting the MSP queue.  The MSP
                only needs to relay all mail to the MTA.  Problem found
                by Gary Mills of the University of Manitoba.
        Avoid problems with QueueSortOrder=random due to problems with
                qsort() on Solaris (and maybe some other operating systems).
                Problem noted by Stephan Schulz of Gruner+Jahr..

8.12.0/8.12.0   2001/09/08
        If the new option FastSplit (defaults to one) has a value greater
                than zero, it suppresses the MX lookups on addresses when they
                are initially sorted which may result in faster envelope
                splitting.  If the mail is submitted directly from the
                command line, then the value also limits the number of
                processes to deliver the envelopes; if more envelopes are
                created they are only queued up and must be taken care of
                by a queue run.
        QueueSortOrder=Random sorts the queue randomly, which is useful if
                several queue runners are started by hand to avoid contention.
        QueueSortOrder=Modification sorts the queue by the modification time
                of the qf file (older entries first).

Note that none of these make any mention whatsoever to tracking previous average delivery times for a recipient and using this as a predictor for future average delivery times, and therefore sorting the current input on this basis.

But please check again to make sure I didn't miss something. You know me, I've only been mucking about with sendmail since ~1991, my name only comes up in the full RELEASE_NOTES four times, I was only the sendmail FAQ maintainer from ~1995 to ~1997, and I could easily have forgotten or missed something.

 Postfix has some very interesting features that make it much better to
 use than Sendmail, but the one that sets it most apart in added
 efficiency is its default queueing structure.

You mean the hashed queues? Yes, that's good, but sendmail can do better with the optional multiple queue structure. With this option, sendmail gives you more control over how many queues are created at what depth, instead of giving you an arbitrary number of sixteen queue directories per hash level. Since most filesystems start flaking out with more than about 1000 directory entries at a single level, you can flatten the sendmail queue structure significantly and still have fewer files per leaf directory node than postfix would allow.

Moreover, it is the hashed queue structure that postfix uses, and the way it uses the disk for queue management by moving files from one directory structure to another, which causes the fundamental performance limitations which sendmail allows you to exceed. Note that sendmail never moves files around on-disk, and therefore does not result in additional unnecessary synchronous meta-data updates.

Indeed, with the safe asynchronous writes feature, sendmail can safely avoid causing any asynchronous meta-data updates at all for most cases, as the mail messages are small enough that they can be buffered in memory and delivered on the initial delivery attempt. Only large messages or messages that fail the initial delivery attempt end up getting written to disk at all, which means that sendmail can approach pure RAM/network I/O throughput speeds whereas postfix will always be bound by disk I/O.

 I do agree with you though, that if the MTA (or Mailman) could
 periodically sweep the MTA delivery logs and sort the domains from
 fastest to slowest, there would be an increase in efficiency.

This is the feature *I* was talking about, although I'd be inclined to do it on an individual basis and not a domain basis, since some individuals might have .procmailrc or other processing scripts on the remote end that might be significantly slower to process than other recipients within the same domain.

For situations where this is not an issue at the remote end, the problem would largely solve itself because all those recipients would tend to sort together anyway.

 For larger lists and Mailman, I have found that nothing beats using a
 RAM disk and accessing the list database files via the mounted RAM disk.
 The speed increase can be 100x faster.

If you're going to be a professional spammer, then I would suggest using the professional spammer tools.

Otherwise, if you're going to run a mailing list for normal people, then I would suggest that you pay attention to sections 5.3.3 and 5.3.4 of RFC 1123 "Internet Host Requirements", which is also part of STD0003:

5.3.3 Reliable Mail Receipt

         When the receiver-SMTP accepts a piece of mail (by sending a
         "250 OK" message in response to DATA), it is accepting
         responsibility for delivering or relaying the message.  It must
         take this responsibility seriously, i.e., it MUST NOT lose the
         message for frivolous reasons, e.g., because the host later
         crashes or because of a predictable resource shortage.

         If there is a delivery failure after acceptance of a message,
         the receiver-SMTP MUST formulate and mail a notification
         message.  This notification MUST be sent using a null ("<>")
         reverse path in the envelope; see Section 3.6 of RFC-821 .  The
         recipient of this notification SHOULD be the address from the
         envelope return path (or the Return-Path: line).  However, if
         this address is null ("<>"),  the receiver-SMTP MUST NOT send a
         notification.  If the address is an explicit source route, it
         SHOULD be stripped down to its final hop.

         DISCUSSION:
              For example, suppose that an error notification must be
              sent for a message that arrived with:
              "MAIL FROM:<@a,@b:[EMAIL PROTECTED]>".  The notification message
              should be sent to: "RCPT TO:<[EMAIL PROTECTED]>".

              Some delivery failures after the message is accepted by
              SMTP will be unavoidable.  For example, it may be
              impossible for the receiver-SMTP to validate all the
              delivery addresses in RCPT command(s) due to a "soft"
              domain system error or because the target is a mailing
              list (see earlier discussion of RCPT).

         To avoid receiving duplicate messages as the result of
         timeouts, a receiver-SMTP MUST seek to minimize the time
         required to respond to the final "." that ends a message
         transfer.  See RFC-1047 [SMTP:4] for a discussion of this
         problem.

In particular, this means that you can't use a RAM disk for this application. You *could* use a battery-backed solid-state disk, so long as you could guarantee that it is configured in such a way that it will survive power loss, reboots, remounting, filesystem check, etc.... Of course, proper SSD is much, much more expensive than a simple RAM disk.

The alternative is using sendmail with the above-mentioned safe asynchronous writes feature, which allows you to get full use of your RAM, at nearly RAM disk speeds, but to do so safely.

--
Brad Knowles, <[EMAIL PROTECTED]>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)

------------------------------------------------------
Mailman-Users mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/

This message was sent to: [EMAIL PROTECTED]
Unsubscribe or change your options at
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Re: [Mailman-Users] Mailman performance / sends per hour

Reply via email to