On Sunday 15 April 2012 21:44:34 Steve McIntyre wrote: > > I've posted a bug at > > http://moinmo.in/MoinMoinBugs/SubscribedPagesPerformanceProblem > > Quick summary: > > On a site with a large number of registered users > (e.g. wiki.debian.org), saving a page taks a very long time. With a > large number of users, the design of the page subscription system > doesn't scale well. Saving a page works well, but moin then scans all > the user data files looking for the subscribed_pages data. With > thousands of users registered, this can take a very long time; we're > seeing > 90 seconds on a wiki with more than 10,000 users.
I can see that the offending code is in MoinMoin/Page.py, specifically the getSubscribers method of the Page class. This looks like a classic case of needing to "invert" the way the data is stored so that it can be queried more efficiently - it's a bit like comparing the standard text search functionality with Xapian-based searching, where the former relies on scanning pages sequentially (pages yield terms), whereas the latter employs such "inverted" storage of queryable information (terms yield pages). > This area needs fixing in some way - maybe add a cache in front of the > user lookup here, or store the subscribed_pages information > differently. I might be able to help with coding this, but I'd want to > see what other people think first in terms of a design. > > What do people think? I'd be inclined to index the subscription information so that there's a more efficiently queryable structure (pages yielding subscribers) that can be used in preference to the existing approach. Having subscriptions amend the index when created would eliminate any need for periodic reindexing, and I think you could implement this by having an event handler that can handle the SubscribedToPageEvent type of event. There are probably other areas of Moin that could benefit from Xapian-based indexing, but this certainly looks like a good application of it. Paul ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Moin-user mailing list Moin-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/moin-user