Mark, I realized I hadn't restarted my QRunners after this patch. It looks like its working perfectly now! Even with a sleep of 0. Thanks so much!
-- Drew On Thu, September 29, 2011 2:52 am, Andrew Case wrote: > Thanks Mark, see inline comments. > >>> [mailman@myhost] ~/logs |> grep testlist subscribe | grep acase >>> Sep 28 17:15:14 2011 (4401) testlist: new ac...@example.com, admin mass >>> sub >>> Sep 28 17:19:36 2011 (5821) testlist: deleted ac...@example.com; member >>> mgt page >>> [mailman@myhost] ~/logs |> ../bin/list_members testlist | grep acase >>> ac...@example.com >>> [mailman@myhost] ~/logs |> >> >> >> There is a bug in the Mailman 2.1 branch, but the above is not it. The >> above log shows that ac...@example.com was added by admin mass subscribe >> at 17:15:14 and then a bit more than 4 minutes later, was removed by >> checking the unsub box on the admin Membership List and submitting. > > I was trying to show that even after the user was removed, they're still > listed as a member. > >> If you check your web server logs, you will find POST transactions to >> the admin page for both these events. > > Agreed. > >>> The same also happens when subscribing. I will mass subscribe users >>> (or >>> when users confirm subscription via email/web), the logs indicated that >>> they have been subscribed successfully, but then when I go look them >>> up, >>> they are not listed on the members list. >>> >>> This happens sporadically, but I am generally able to reproduce the >>> error >>> if I do it a couple times in a row. >> >> >> This is possibly a manifestation of the bug, but I'm surprised it is >> happening that frequently. > > Easiest way for me to replicated the problem is: > 1) check the unsubscribe box for user A then hit submit > 2) after reload check the unsubscribe box for user B then hit submit > 3) reaload the "membership list" page and user B is back on the list > > This happens even after I wait a couple seconds in between each step. > >>> I'm suspicious there may be a locking issue and config.pck is reverting >>> to >>> config.pck.last. I found this thread rather helpful in analyzing >>> potential problems, but I have yet to figure anything out: >>> http://web.archiveorange.com/archive/v/IezAOgEQf7xEYCSEJTbD >> >> >> The thread you point to above is relevant, but it is not a locking >> issue. The problem is due to list caching in Mailman/Queue/Runner.py >> and/or nearly concurrent processes which first load the list unlocked >> and later lock it. The issue is that the resolution of the config.pck >> timestamp is 1 second, and if a process has a list object and that list >> object is updated by another process within the same second as the >> timestamp on the first process's object, the first process won't load >> the updated list when it locks it. This can result in things like a >> subscribe being done and logged and then silently reversed. > > The result sounds the same, but would this happen even if I'm loading the > page with more than a second in between each step outlined above? > >> List locking is working as it should. The issue is that the first >> process doesn't reload the updated list when it acquires the lock >> because it thinks it already has the latest version. >> >> I thought I had fixed this on the 2.1 branch, but it seems I only fixed >> it for the now defunct 2.2 branch. >> >> A relevant thread starts at >> <http://mail.python.org/pipermail/mailman-users/2008-August/062862.html> >> and continues at >> <http://mail.python.org/pipermail/mailman-developers/2008-August/020329.html> >> >> The patch in the attached cache.patch file should fix it. > > I applied the patch but it doesn't seem to have made a difference. > > >>> In addition if I just run the following commands over and over, then >>> the >>> bug never seems to come up. This is part of why I am worrying about >>> locking: >>> bin/add_members ... >>> bin/remove_members ... >> >> >> That won't do it. bin/add_members alone will do it, but only if there is >> a nearly concurrent process updating the same list. >> >> >>> Is there a good way to test locking between servers? I've run the >>> tests/test_lockfile.py, but it reports it is OK. >>> >>> Any and all help would be GREATLY appreciated. We've been trying to >>> triage this bug for weeks and it is terribly disruptive for our users. >> >> >> The post at >> <http://mail.python.org/pipermail/mailman-users/2008-August/062862.html> >> contains a "stress test" that will probably reproduce the problem. > > Correct. Only one subscriber was subscribed to each test list. Keep in > mind that in the stress test given if you use a sleep counter of 5 with 6 > lists, that means you're waiting _30 seconds_ before the next add_member > command is run for that list (I'm assume the timing issue is per-list, not > per run of add_members). Even if you set the timer down to 1 that's a 6 > second sleep. This shouldn't effect a cache that we're comparing for the > given second. Anyway, my script ran fine with the 5 second sleep (30 > seconds per list add), but showed discrepancies with a 3 second sleep. > >> I suspect your Mailman server must be very busy for you to see this bug >> that frequently. However, it looks like I need to install the fix for >> Mailman 2.1.15. > > We run about 600 different mailing lists for our department and this has > been a continues headache. I appreciate all the hard work you guys do. > >> It is also curious that the only reports of this that I can recall both >> come from solaris users. There may be complications in your case due to >> NFS, but locking shouldn't be the issue. Run the stress test and see if >> it fails. If it does, try the patch. > > Patch didn't seem to help. Is there an easy way to omit the caching in > this? > > Thanks, > -- > Drew > >> >> Let us know what happens. >> >> -- >> Mark Sapiro <m...@msapiro.net> The highway is for gamblers, >> San Francisco Bay Area, California better use your sense - B. Dylan >> >> > > Andrew Case Systems Administrator Courant Institute of Mathematical Sciences New York University 251 Mercer St., Room 1023 New York, NY 10012-1110 Phone: 212-998-3147 _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9