On Sun, Apr 15, 2012 at 10:18:07PM +0200, Paul Boddie wrote: >On Sunday 15 April 2012 21:44:34 Steve McIntyre wrote: >> >> I've posted a bug at >> >> http://moinmo.in/MoinMoinBugs/SubscribedPagesPerformanceProblem >> >> Quick summary: >> >> On a site with a large number of registered users >> (e.g. wiki.debian.org), saving a page taks a very long time. With a >> large number of users, the design of the page subscription system >> doesn't scale well. Saving a page works well, but moin then scans all >> the user data files looking for the subscribed_pages data. With >> thousands of users registered, this can take a very long time; we're >> seeing > 90 seconds on a wiki with more than 10,000 users. > >I can see that the offending code is in MoinMoin/Page.py, specifically the >getSubscribers method of the Page class. This looks like a classic case of >needing to "invert" the way the data is stored so that it can be queried more >efficiently - it's a bit like comparing the standard text search >functionality with Xapian-based searching, where the former relies on >scanning pages sequentially (pages yield terms), whereas the latter employs >such "inverted" storage of queryable information (terms yield pages).
Yup, exactly. >> This area needs fixing in some way - maybe add a cache in front of the >> user lookup here, or store the subscribed_pages information >> differently. I might be able to help with coding this, but I'd want to >> see what other people think first in terms of a design. >> >> What do people think? > >I'd be inclined to index the subscription information so that there's a more >efficiently queryable structure (pages yielding subscribers) that can be used >in preference to the existing approach. Having subscriptions amend the index >when created would eliminate any need for periodic reindexing, and I think >you could implement this by having an event handler that can handle the >SubscribedToPageEvent type of event. OK, I'll have a play at that now and see if I can get it working. -- Steve McIntyre, Cambridge, UK. st...@einval.com Getting a SCSI chain working is perfectly simple if you remember that there must be exactly three terminations: one on one end of the cable, one on the far end, and the goat, terminated over the SCSI chain with a silver-handled knife whilst burning *black* candles. --- Anthony DeBoer ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Moin-user mailing list Moin-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/moin-user