On Mon, Mar 04, 2002 at 05:16:21PM -0500, Barry A. Warsaw wrote: > Okay, I've looked over all the code. Except for some stylistic > issues, which I'll just correct as I go, my biggest concern is the > database used in AvoidDuplicates.py. Yeah, I looked at that too, but being tired, I didn't get as critical as you did. I figured it somehow worked ok for Ben for his lists (bad Marc, no cookie) Now that I think of it, Ben wrote initially wrote this for qrunner from cron, and didn't think about this issue when he ported it to mailman-cvs-sept-2001 > It looks like you're keeping an in-memory dictionary mapping > addresses to a set of Message-ID:'s. You use this to decide if the > recipient address has already received a message of the given > Message-ID: That was my understanding of the code too. > Let's ignore the duplicate or missing Message-ID: issue for now. The > biggest problem I see is that 1) you lose all the mappings if you > restart your IncomingRunner, and
That's probably not a problem because 1) it would only affect a message being processed at the time you kill and restart IncomingRunner, not very likely, and worst case, you do get a second copy. 2) You don't restart IncomingRunner often if at all 3) When you do restart qrunner, there can be other qwirks, like a message being delivered twice (I've seen this with VERP enabled, I probably killed it while it was delivering a batch to exim, so it didn't complete and did it all over again after the restart) > 2) your process will grow without bounds until you do restart your > IncomingRunner. I think you're right. You'd have to have a lot of traffic before it catches up with you, but it will eventually if you never restart qrunner. > I'm not sure about the best thing to do. Sticking this data structure > in the list, or otherwise making it persistent, could take too much > resources for not much gain. The second issue is more important, > especially given that all our runners are now long running processes, > and I think most of the unbounded memory growth issues are taken care > of. Probably the best thing to do is to evict any entry in the > dictionary that's older than a day or two. That sounds like a reasonable plan. > Then again, this whole data structure seems intended to avoid > duplicates when lists are crossposted. It shouldn't be necessary if > we just want to filter out duplicates to explicitly named recipients. > Maybe we don't need both features, as the former seems to be much less > requested than the latter? That's true. The later is nice for instance when you have threads Cced accross mailman-devel and mailman-users, but having the former by itself would be good already. If this is a time issue wrt fixing the code, the duplicate message-ID code could be left behind a global option that is disabled by default and a comment saying that you should be ready to restart your qrunner weekly or daily if you enable it That said, adding a timestamp to them, and deleting everything that's more than 1H old is a better solution. > I think what I'll do for now is code up and test the original > approach. I'm on irc now so please join me if you want to talk about > it. > > now-if-i-can-just-get-OPN-to-stop-kicking-me-off!-ly y'rs, <-- barry has quit (Read error: 110 (Connection timed out)) :-) I was saying there: about OPN, I don't know the details (and there were also politics), but the sf.net guys moved their channels to slashnet.org. Seems to work better there Marc -- Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | Finger [EMAIL PROTECTED] for PGP key _______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers