Hey Mark, Take a look at these results that are with the 2 patches you've sent me (and qrunners restarted). I'm adding 10 users to 6 lists:
[... creating list output cut ...] + Subscribing to testlist1 Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: fo...@www.cs.nyu.edu + Subscribing to testlist2 Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Traceback (most recent call last): File "/usr/mailman/bin/add_members", line 258, in <module> main() File "/usr/mailman/bin/add_members", line 238, in main addall(mlist, nmembers, 0, send_welcome_msg, s) File "/usr/mailman/bin/add_members", line 135, in addall mlist.ApprovedAddMember(userdesc, ack, 0) File "/usr/mailman/Mailman/MailList.py", line 948, in ApprovedAddMember assert self.Locked() AssertionError Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: f...@www.cs.nyu.edu Subscribed: fo...@www.cs.nyu.edu + Subscribing to testlist3 Subscribed: f...@www.cs.nyu.edu [... subscribing users to all other lists went fine ...] + Subscribers for testlist1: fo...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu + Removing list testlist1 + Subscribers for testlist2: f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu + Removing list testlist2 [... the rest were all fine ...] There was a locking issue with testlist2 foo4, which is fine since it doesn't report back as successful. But you'll also notice that foo10 wasn't listed as a subscriber even though it appeared as though that subscribe was successful. Here's some errors on the very next run where I'm subscribing 10 people to each list as well: [... cut expected results ...] + Subscribers for testlist4: fo...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu [** no foo8 **] f...@www.cs.nyu.edu f...@www.cs.nyu.edu + Removing list testlist4 + Subscribers for testlist5: fo...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu f...@www.cs.nyu.edu [** no foo8 or foo9 **] + Removing list testlist5 [... cut expected results ...] That's my (1-5%) failure. But when I also comment out the following: >> #if mtime < self.__timestamp: >> # # File is not newer >> # return None, None It seems to work each time (I ran 3 tests in a row, all with expected results). Let me know what you think. Thanks, -- Drew On Thu, September 29, 2011 1:30 pm, Mark Sapiro wrote: > On 9/29/2011 8:58 AM, Andrew Case wrote: >> >> When I did this I saw that maybe 1-5% of the time a user was still >> omitted >> from the list (user was silently removed). I think that because these >> are >> processed by the queue runner on a different host and because the >> timestamp check is being done on an NFS stored file, there is potential >> that the qrunner for this doesn't yet have an updated mtime for that >> file >> (or even a small ntp time drift could cause this). When I commented out >> the caching part of the code in MailList.py this bug never seems to show >> up: >> #if mtime < self.__timestamp: >> # # File is not newer >> # return None, None > > > Actually, that is not the cache. It is just the test for whether the > current list object, cached or whatever, needs to be reloaded from disk. > > I think that your configuration with NFS and possible time jitter > between servers makes the bug more likely. > > >> So I think there may still be a race condition here, but the chances of >> it >> are unlikely that human interaction would trigger this. If however, you >> have a script that is subscribing users (one after another), this could >> still come up. I actually happen to have such a script, but I run it on >> the same host as the qrunners, so I haven't experienced this before. > > > It can happen even where everything is on a single host, but as I said, > I think your configuration makes it more likely. > > >> In my case I think it's probably not worth keeping the performance gain >> that the caching adds for sake of consistency. > > > Attached is a patch to remove list caching from the qrunners. This patch > has the additional advantage of limiting the growth of the qrunners over > time. Old entries were supposed to be freed from the cache, but a self > reference in the default MemberAdaptor prevented this from occurring. > > For reasons of trying not to be disruptive this patch and the bug fix I > sent earlier were never applied to the 2.1 branch. I think this was a > mistake, and I will apply them for Mailman 2.1.15. > > >> Attached is the modified stress test I'm using. > > > Thanks. > > -- > Mark Sapiro <m...@msapiro.net> The highway is for gamblers, > San Francisco Bay Area, California better use your sense - B. Dylan > > Andrew Case Systems Administrator Courant Institute of Mathematical Sciences New York University 251 Mercer St., Room 1023 New York, NY 10012-1110 Phone: 212-998-3147 _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9