I am moving this discussion to the James Developer List, since none of it is considered sensitive. Messages ordered from bottom (oldest) to top. Mark Imel, Jason Webb, Craig Mattson, and others interested in mailing list management with James please take note.
Feedback, corrections, etc. are all actively solicited. Oh, but guys ... please do NOT reply with this message embedded. It is long enough as it is. :-) Just trim quotes, if any are needed at all. ---------------------- > Any realistic performance test has to account for the fact that > SMTP deliveries take anywhere from seconds to minutes depending > on network connections; the average for a fresh message seems to > be about 5 seconds, based on the fact that on daedalus qmail's > max concurrency is 509 and when delivering to a fresh list you > see 80-100 deliveries per second. That actually matters more > than any internal performance in terms of adding and removing > messages from the queue. I agree, which is why I wanted to clarify the situation. I am sorry for the initial confusion. FWIW, my normal load test generates about 1/2 the normal ASF load in individual outgoing messages, one per connection, although that is on a LAN. My current tests don't simulate the mailing list connection situation right now, but I'll put together a suitable test as James gets closer to being able to deliver on your needs. > Also, is the queue database transactional? Are we sure if the system > goes down that we won't lose mail? We may not be able to guarantee that the message won't be sent twice, if the system happens to go down at the wrong time, but the message is not removed from the queue until after it is confirmed to have been sent. > delivery can't be sequential to a list; when you're given a list of 100 > recipients, the mail server should be sending to those recipients in > parallel, not waiting for the first recipient delivery to finish before > beginning the next. If this is already addressed, great. We can tell James how many delivery threads to use. Yes, multiple threads would be used except where a message has multiple recipients at the same domain. The current code separates the recipients by domain, and spools a separate outgoing message per domain. In the case of VERP, each message delivery would be unique because of the MAIL FROM. However, that does not mean that we could not optimize message delivery. Doing VERP with the current code would mean a unique Mail spooled per receipient, each with its own unique sender. I believe that we can optimize that process. The link I usually use for VERP is http://cr.yp.to/proto/verp.txt. Unless I'm missing something, it could be done similarly to how we take a message in LocalDelivery, and then for each mailbox add a unique Delivered-To header to a clone of the message being streamed into the mailbox. For VERP, we would use a unique sender (an envelope attribute, not a Sender: header). > With VERP, the MAIL FROM line is different for every recipient, and > thus every recipient requires a separate SMTP connection. 100K AOL > users on a list? 100K connections. Yes, for each recipient we need a unique MAIL FROM. But both RFC 821 & 2821 permit more than one MAIL FROM per connection (for that matter, my mail client expects it). We wouldn't save anything on the data transfer, but why not reuse the connection, and save the connection establishment delay, when the remote MTA permits it and there are multiple mail messages to transfer? What am I missing? I've no ego investment in any particular idea; just mentally exploring what more we could do inside the delivery engine that would reduce transfer time and cycles. If I've missed something, e.g., if I had overlooked that VERP means that we cannot send one copy to AOL with list of RCPT TO commands, then I'm wrong. It has been known to happen. Occassionally. ;-) > Is there some web-based means for admins to cruise through the > pending delivery queue? Not a one. Something to do. What operations would you like to see supported? The probable solution will use JMX. --- Noel -----Original Message----- From: Noel J. Bergman [mailto:[EMAIL PROTECTED] Sent: Monday, June 09, 2003 2:22 To: Brian Behlendorf Cc: James-PMC Mailing List Subject: RE: James delivery volume Brian, Don't get too carried away by the performance. :-) A key point that you might not have caught was that I was using an SMTP sink, which means that instead of opening a connection per recipient, there was one connection per message. There would be a performance degradation opening connections per recepient. But from these few measurements, it does appear that the key issue is going to be optimizing those outgoing connections. Everything else appears to be well within the performance that James is capable of providing. For really high volume, Craig Mattson provided some thoughts based upon supporting lists larger that all but a few ISPs (not with James :-)): http://nagoya.apache.org/wiki/apachewiki.cgi?JamesV3/HighVolume but I don't think that we need to do all of those things (e.g., clustering) before we can handle the ASF load. What I tried to describe in my previous message is to change the list delivery. Instead of taking a message, attaching the recipient list, and spooling it for the standard delivery engine, I was thinking that the list delivery engine could be (at least conceptually) a subclass of the standard delivery engine. Not having to spool the message with the recipient list attached would provide some savings (James does create only one queue entry, but it has the expanded recipient list attached). But the major improvement, or so goes my hypothesis, would come from the delivery engine having a more tightly coupled relationship with the delivery list. The list delivery engine could pre-sort addresses by MX record based upon recipient domains, for example, and just run through this pre-sorted delivery data for each message, at least for normal list delivery (individual retries might be kicked out to the standard delivery process). That is what I meant by applying the message to the list, instead of the list to the message. The test system runs RedHat 6.2 (latest linux 2.2 kernel), soon to be RedHat 8.0 (latest updates). Both the mailet pipeline spool and the outgoing gateway spool were in the file system. The mailing list roster was in MySQL, though. A quick look in the apmail hierarchy was heartening. Although I just looked at a few lists, my guess is that the biggest list is tomcat-user, which has just under 2300 subscribers. The user list for httpd is 60% that size, and other lists were running in the 100s of users. I didn't play with any of the tools, since I don't know them well enough not to be sure of not screwing anything up. Don't laugh, but I just cat'd (cat * >> ~noel/tmp/<list>) the subscriber list files, and then counted the number of @ signs. :-) The new Mailing List Manager that Mark wrote is extensible. Basically, it provides a structure for adding new commands by adding new command classes. Right now it just provides confirmed subscribe, confirmed unsubscribe and info. There are features missing that you've already expressed as wanting: http://nagoya.apache.org/wiki/apachewiki.cgi?HostApacheOnJames so we've got some "marching orders." :-) Mark's new MLM just provides a framework for implementing them. I have no idea what the volume of James installations might be, nor a good way to count it. This page: http://nagoya.apache.org/wiki/apachewiki.cgi?JamesUsers, has comments from some of our users, and there is a similar page for volunteers. There are businesses using James as their primary mail server. That's about all I can tell you. I could post another request or add something to the front page asking people to tell us how they use James. One other data point, actually. "Alice K" started a survey last November. Most of the entries in the survey pre-date v2.1.0, but I still find it informative, e.g., the percentages of people using file vs SQL or Windows vs linux: http://infopoll.net/live/surveys.dll/r?sid=19892&r=29845 People can still contribute (http://infopoll.net/live/surveys/s19892.htm), but their responses will be mixed with those related to old versions. If nothing that we've discussed in the past few messages is sensitive, I'll post it up to james-dev. --- Noel -----Original Message----- From: Brian Behlendorf [mailto:[EMAIL PROTECTED] Sent: Sunday, June 08, 2003 21:27 To: Noel J. Bergman Cc: James-PMC Mailing List Subject: Re: James delivery volume On Sat, 7 Jun 2003, Noel J. Bergman wrote: > I have adjusted my load testing. Focusing on other usage scenarios, my most > recent load test had been lots of short simultaneous messages with a mix of > local and remote mailboxes. My revised load test cuts way back to only 100 > 15K messages per minute, incoming, of which roughly 1/3 are relayed to 1111 > remote users, 1/3 are relayed individually, and 1/3 are local. That works > out to ~5.3 million messages per day. Mind you, since I don't have 1111 > individual target hosts, I'm using a single SMTP sink, so the performance > isn't really representative. > > On a 400mhz Celeron (current test server), the CPU averages at least 1/3 > idle, with a range from just below 20% to ~50%. That is after the JVM has > had time to run the jitter (Hotspot Server). Wow, that's terrific. What OS? Also, is the delivery queue stored on disk or in mysql? > I'm pleasantly surprised. We have really not focused at all on optimizing > mailing list performance, although there was some work done last week to > resolve complaints from a user who was testing lists of over 10000 > recipients. We have a new mailing list manager coming imminently from Mark > Imel, which adds a lot of new features, and some of our users have custom > modifications to James for doing high volume delivery. I wonder how hard it would be to put together a comparison between the feature sets in the new MLM from Mark, and EZMLM. If Mark could send me a doc on features I'd put some time into seeing what's missing to make it suitable for the ASF. Sounds stable - how many live JAMES installations to you think are out there? Any way to sample? > No action item, but I do consider this to be good news. > > To more accurately simulate the target environment, it would be helpful to > have a statistical snapshot of the ASF lists. Just the number of lists, > although I can estimate from the eyebrowse archives, and the number of > recipients on each. Is that accessible on daedalus? The best thing I could do is allow you to sudo to the apmail user, so you can sniff around ~apmail/lists/. Should be easy to careen though the lists using foreach and ezmlm-list. OK, done, > I hypothesize that one optimization would be to more tightly couple a remote > delivery engine with the list manager, so that we don't have to queue the > recipient list with each message. We could optimize the recipient > information in the list, and then apply the message to the list, rather than > apply the list to the message. I don't exactly follow, but if you're suggesting that you optimize by creating only one delivery queue entry for a message to the list, versus a queue entry per recipient on the list, that makes sense to me. That's how all the ones I'm aware of do it. > P.S. I didn't know if the details about daedalus you'd mentioned would be > considered sensitive, so I didn't CC the James developer list, but I did > want the other folks on the James PMC to stay informed. Basic stats are fine to share widely. Brian -----Original Message----- From: Noel J. Bergman [mailto:[EMAIL PROTECTED] Sent: Saturday, June 07, 2003 13:07 To: Brian Behlendorf Cc: James-PMC Mailing List Subject: James delivery volume > > what is the approx volume of incoming (unique) messages? When > > you talk about 1 million per day, that is the output from the > > list server? > That's in SMTP delivery attempts, both remote and local, though the vast > majority are remote. Deadalus has been up for 74 days and made as of just > now 95,950,272 delivery attempts. Incoming, I have no specific estimate, > but running a "tail -f /var/log/qmail/smtpd/current" can be fun. I have adjusted my load testing. Focusing on other usage scenarios, my most recent load test had been lots of short simultaneous messages with a mix of local and remote mailboxes. My revised load test cuts way back to only 100 15K messages per minute, incoming, of which roughly 1/3 are relayed to 1111 remote users, 1/3 are relayed individually, and 1/3 are local. That works out to ~5.3 million messages per day. Mind you, since I don't have 1111 individual target hosts, I'm using a single SMTP sink, so the performance isn't really representative. On a 400mhz Celeron (current test server), the CPU averages at least 1/3 idle, with a range from just below 20% to ~50%. That is after the JVM has had time to run the jitter (Hotspot Server). I'm pleasantly surprised. We have really not focused at all on optimizing mailing list performance, although there was some work done last week to resolve complaints from a user who was testing lists of over 10000 recipients. We have a new mailing list manager coming imminently from Mark Imel, which adds a lot of new features, and some of our users have custom modifications to James for doing high volume delivery. No action item, but I do consider this to be good news. To more accurately simulate the target environment, it would be helpful to have a statistical snapshot of the ASF lists. Just the number of lists, although I can estimate from the eyebrowse archives, and the number of recipients on each. Is that accessible on daedalus? I hypothesize that one optimization would be to more tightly couple a remote delivery engine with the list manager, so that we don't have to queue the recipient list with each message. We could optimize the recipient information in the list, and then apply the message to the list, rather than apply the list to the message. --- Noel P.S. I didn't know if the details about daedalus you'd mentioned would be considered sensitive, so I didn't CC the James developer list, but I did want the other folks on the James PMC to stay informed. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
