On Sunday 15 April 2012 21:44:34 Steve McIntyre wrote:
>
> I've posted a bug at
>
>   http://moinmo.in/MoinMoinBugs/SubscribedPagesPerformanceProblem
>
> Quick summary:
>
> On a site with a large number of registered users
> (e.g. wiki.debian.org), saving a page taks a very long time. With a
> large number of users, the design of the page subscription system
> doesn't scale well. Saving a page works well, but moin then scans all
> the user data files looking for the subscribed_pages data. With
> thousands of users registered, this can take a very long time; we're
> seeing > 90 seconds on a wiki with more than 10,000 users.

I can see that the offending code is in MoinMoin/Page.py, specifically the 
getSubscribers method of the Page class. This looks like a classic case of 
needing to "invert" the way the data is stored so that it can be queried more 
efficiently - it's a bit like comparing the standard text search 
functionality with Xapian-based searching, where the former relies on 
scanning pages sequentially (pages yield terms), whereas the latter employs 
such "inverted" storage of queryable information (terms yield pages).

> This area needs fixing in some way - maybe add a cache in front of the
> user lookup here, or store the subscribed_pages information
> differently. I might be able to help with coding this, but I'd want to
> see what other people think first in terms of a design.
>
> What do people think?

I'd be inclined to index the subscription information so that there's a more 
efficiently queryable structure (pages yielding subscribers) that can be used 
in preference to the existing approach. Having subscriptions amend the index 
when created would eliminate any need for periodic reindexing, and I think 
you could implement this by having an event handler that can handle the 
SubscribedToPageEvent type of event.

There are probably other areas of Moin that could benefit from Xapian-based 
indexing, but this certainly looks like a good application of it.

Paul

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Moin-user mailing list
Moin-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/moin-user

Reply via email to