--- On Wed, 10/29/08, Brion Vibber <[EMAIL PROTECTED]> wrote:

> From: Brion Vibber <[EMAIL PROTECTED]>
> Subject: Re: [Mediawiki-api] List of all authors via API
> To: "MediaWiki API announcements & discussion" 
> <mediawiki-api@lists.wikimedia.org>
> Date: Wednesday, October 29, 2008, 10:24 AM
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Magnus Manske wrote:
> > On Fri, Oct 24, 2008 at 5:59 PM, Brion Vibber
> <[EMAIL PROTECTED]> wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Johannes Beigel wrote:
> >>> Is there a way (or a plan to implement one) to
> retrieve the list of
> >>> unique contributors for a given article (from
> a given revision down to
> >>> the first one)? Ideally this would accept
> parameters for the mentioned
> >>> filtering. I guess inside of MediaWiki code
> this can be handled very
> >>> efficiently (using appropriate database
> queries) and would eliminate
> >>> the need to transfer lots of redundant data
> over the socket.
> >> Given that this could require filtering through
> hundreds of thousands of
> >> unique revisions for a single request, I don't
> think we currently have a
> >> good plan for that. :)
> > 
> > I just ran a DISTINCT mysql query for all non-IP
> editors of
> > [[en:George W. Bush]] on the toolserver, and that took
> 3 seconds.
> > There are 41790 revisions.
> 
> Indeed, it's not as bad as I was afraid. I'm still
> a little leery that
> the EXPLAIN lists "Using temporary" though. :P
> 
> > Considering that this would be a worst case article,
> and that it ran
> > on the overtaxed toolserver, it does seem possible.
> Maybe if we'd have
> > one MySQL slave / Apache dedicated for this task?
> 
> Probably fine to pull from the same slaves already
> dedicated for
> contributions queries (relevant indexes are already pulled
> into memory).
> 
> Figuring out how to get something other than a raw list of
> thousands of
> editors for a "nice" author list remains a harder
> task. :)

wouldn't that be a snap using the group_by function?  sorry, I don't know the 
database structure, but generically:

  SELECT contributors, COUNT(*) FROM database GROUP BY contributors

would return a list of all contributors and the number of contributions they've 
made; it could be tweaked to return only those contributors who've made over X 
contributions.  of course, I've only worked on small databases, so I have no 
idea what the overhead on this would be...


      

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to