Am 12.05.2008 um 23:20 schrieb Mark Sapiro:
I understand what you are saying, but I wonder what the real world
difference would be. As currently written, chunkify returns at most 4
partially filled chunks. Granted, 4 is significantly bigger than one,
but given that the MTA is VERPing the deliveries, it may ultimately
create an outgoing queue entry for each recipient anyway, so the extra
3 on the inbound side doesn't seem that significant (and it might
increase parallelism in the MTA).

First of all, I just noticed that the official code does indeed only create at most 4 partially filled buckets. That's the problem when you have to jump in for someone else: My SMTPDirect.py contains 26 TLDs. Two thoughts:

1. Even with only four buckets, when we have a real world distribution amongst recipient addresses, this is four times the I/O needed. The ratio get's better with the number of list subscribers growing, but if there are less recipients than SMTP_MAX_RCPTS, it's exactly at 1:4. 2. Why even split recipients the way it's done now at all? You have to either add new buckets (add new TLDs) or have all recipients outside the hard coded TLDs be thrown into the same bucket. I could understand it if you first created a list of TLDs involved and sorted by those - though I don't know if it's a good idea if you run a really large list and examine all recipients...

I didn't understand what you said about VERPing and outgoing queue entries - surely any MTA will keep track of recipients on a per message basis? As for parallelism, I think the best way to ensure fast delivery is to make all target destinations known to the MTA as fast as possible.

Given your 25000 member list, and assuming SMTP_MAX_RCPTS = 500, you
would have at most 54 chunks (and more likely 53 or 52) instead of 50.

In any case, If I were coding this, I would be inclined to not make it
an option, but just to change chunkify so it still grouped, but
continued to fill the last chunk of a group from the next group so
there would be at most one partial chunk.

At the moment, I changed the code to simply return SMTP_MAX_RCPTS per chunk - or all recipients if there are less than that. Hardcoded, not configurable. The way it is done now I can't see any real advantages - especially living outside the U.S. Either improve the sorting algorithm (all TLDs, don't return partial chunks) or make it configurable to skip sorting altogether. Or at least that's what I feel would be an improvement. Have it default to flat chunking. It saves CPU time, I/O operations and gives the MTAs queue manager more time to do it's job.


Cheers
Stefan
--
Stefan Förster     http://www.incertum.net/     Public Key: 0xBBE2A9E9
Written on OSX. Who ate my ~/.signature?

_______________________________________________
Mailman-Developers mailing list
[email protected]
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp

Reply via email to