On 9/28/2011 3:08 PM, Andrew Case wrote: > My configuration: > Mailman: 2.1.14 > OS: Solaris 10 > Python: 2.4.5 > PREFIX = '/usr/mailman' > Server setup: 1 server for web management, 1 server for MTA/qrunner. > /usr/mailman is NFS mounted on both servers > > > I've been having the following issue my mailman lists: > > A user is either subscribed or unsubscribed according to the logs, but > then if I look at the member list, the action has not been done (or has > been undone). For example, here is where I remove a subscriber and then > look at the list members and they are still in the list: > > [mailman@myhost] ~/logs |> grep testlist subscribe | grep acase > Sep 28 17:15:14 2011 (4401) testlist: new ac...@example.com, admin mass sub > Sep 28 17:19:36 2011 (5821) testlist: deleted ac...@example.com; member > mgt page > [mailman@myhost] ~/logs |> ../bin/list_members testlist | grep acase > ac...@example.com > [mailman@myhost] ~/logs |>
There is a bug in the Mailman 2.1 branch, but the above is not it. The above log shows that ac...@example.com was added by admin mass subscribe at 17:15:14 and then a bit more than 4 minutes later, was removed by checking the unsub box on the admin Membership List and submitting. If you check your web server logs, you will find POST transactions to the admin page for both these events. > The same also happens when subscribing. I will mass subscribe users (or > when users confirm subscription via email/web), the logs indicated that > they have been subscribed successfully, but then when I go look them up, > they are not listed on the members list. > > This happens sporadically, but I am generally able to reproduce the error > if I do it a couple times in a row. This is possibly a manifestation of the bug, but I'm surprised it is happening that frequently. > I'm suspicious there may be a locking issue and config.pck is reverting to > config.pck.last. I found this thread rather helpful in analyzing > potential problems, but I have yet to figure anything out: > http://web.archiveorange.com/archive/v/IezAOgEQf7xEYCSEJTbD The thread you point to above is relevant, but it is not a locking issue. The problem is due to list caching in Mailman/Queue/Runner.py and/or nearly concurrent processes which first load the list unlocked and later lock it. The issue is that the resolution of the config.pck timestamp is 1 second, and if a process has a list object and that list object is updated by another process within the same second as the timestamp on the first process's object, the first process won't load the updated list when it locks it. This can result in things like a subscribe being done and logged and then silently reversed. List locking is working as it should. The issue is that the first process doesn't reload the updated list when it acquires the lock because it thinks it already has the latest version. I thought I had fixed this on the 2.1 branch, but it seems I only fixed it for the now defunct 2.2 branch. A relevant thread starts at <http://mail.python.org/pipermail/mailman-users/2008-August/062862.html> and continues at <http://mail.python.org/pipermail/mailman-developers/2008-August/020329.html> The patch in the attached cache.patch file should fix it. > In addition if I just run the following commands over and over, then the > bug never seems to come up. This is part of why I am worrying about > locking: > bin/add_members ... > bin/remove_members ... That won't do it. bin/add_members alone will do it, but only if there is a nearly concurrent process updating the same list. > Is there a good way to test locking between servers? I've run the > tests/test_lockfile.py, but it reports it is OK. > > Any and all help would be GREATLY appreciated. We've been trying to > triage this bug for weeks and it is terribly disruptive for our users. The post at <http://mail.python.org/pipermail/mailman-users/2008-August/062862.html> contains a "stress test" that will probably reproduce the problem. I suspect your Mailman server must be very busy for you to see this bug that frequently. However, it looks like I need to install the fix for Mailman 2.1.15. It is also curious that the only reports of this that I can recall both come from solaris users. There may be complications in your case due to NFS, but locking shouldn't be the issue. Run the stress test and see if it fails. If it does, try the patch. Let us know what happens. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
=== modified file 'Mailman/MailList.py' --- Mailman/MailList.py 2008-08-21 21:35:20 +0000 +++ Mailman/MailList.py 2008-08-22 22:16:29 +0000 @@ -599,8 +599,11 @@ # file doesn't exist, we'll get an EnvironmentError with errno set # to ENOENT (EnvironmentError is the base class of IOError and # OSError). + # We test strictly less than here because the resolution is whole + # seconds and we have seen cases of the file being updated by + # another process in the same second. mtime = os.path.getmtime(dbfile) - if mtime <= self.__timestamp: + if mtime < self.__timestamp: # File is not newer return None, None fp = open(dbfile) @@ -618,8 +621,9 @@ return None, e finally: fp.close() - # Update timestamp - self.__timestamp = mtime + # Update the timestamp. We use current time here rather than mtime + # so the test above might succeed the next time. + self.__timestamp = int(time.time()) return dict, None def Load(self, check_version=True):
_______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9