Andrew Case wrote: > >Traceback (most recent call last): > File "/usr/mailman/bin/add_members", line 258, in <module> > main() > File "/usr/mailman/bin/add_members", line 238, in main > addall(mlist, nmembers, 0, send_welcome_msg, s) > File "/usr/mailman/bin/add_members", line 135, in addall > mlist.ApprovedAddMember(userdesc, ack, 0) > File "/usr/mailman/Mailman/MailList.py", line 948, in ApprovedAddMember > assert self.Locked() >AssertionError
The above is definitely a problem, but I can't see how it can occur unless there is some race condition at the level of the file system. [...] >There was a locking issue with testlist2 foo4, which is fine since it >doesn't report back as successful. But you'll also notice that foo10 >wasn't listed as a subscriber even though it appeared as though that >subscribe was successful. Yes, I see that and both are cause for concern. I am more concerned about the AssertionError with testlist2 foo4, because I can't see how that can happen without some file system anomaly. >Here's some errors on the very next run where I'm subscribing 10 people to >each list as well: [...] >That's my (1-5%) failure. But when I also comment out the following: > >>> #if mtime < self.__timestamp: >>> # # File is not newer >>> # return None, None > >It seems to work each time (I ran 3 tests in a row, all with expected >results). The code above is a decision about whether we need to reload the list object from the file system based on the file system time stamp vs. our internal time stamp. The original code said "if mtime <= self.__timestamp:". Since these time stamps are in whole seconds, that test (to skip loading) could succeed even if the file time stamp was a fraction of a second newer than the internal time stamp. Thus, the bug. The fix is to make the test "if mtime < self.__timestamp:" meanin we only skip loading if the file time stamp is strictly less than the internal time stamp, but in your case, if the clock on the MTA/qrunner machine is a bit faster than that on the machine running the script, the internal time stamp of the qrunner process could be a second ahead of the file time stamp set by the add_members process even though on an absolute scale it is older. So, I think we're concerned here about clock skew between the machines, and in that case, commenting out the code completely as you have done makes sense. I have filed bug reports at <https://bugs.launchpad.net/mailman/+bug/862675> and <https://bugs.launchpad.net/mailman/+bug/862683> about these issues in preparation to fixing them. Now I don't know whether to stick with the "if mtime < self.__timestamp:" test which will work on a single server or to reload unconditionally as you have done which seems to be necessary in a shared file system situation with possible time skews. As far as the AssertionError is concerned, I don't know what to make of it. It appears to be a file system issue outside of Mailman, so I don't know how to deal with it. I think the code in Mailman's LockFile module is correct. If you are willing to try debugging this further, you could set LIST_LOCK_DEBUGGING = True in mm_cfg.py and restart your qrunners and try to reproduce the exception. This will log copious information to Mailman's 'locks' log which may help to understand what happened. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9