--- On Wed, 10/29/08, Brion Vibber <[EMAIL PROTECTED]> wrote:
> From: Brion Vibber <[EMAIL PROTECTED]>
> Subject: Re: [Mediawiki-api] List of all authors via API
> To: "MediaWiki API announcements & discussion"
> <[email protected]>
> Date: Wednesday, October 29, 2008, 10:24 AM
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Magnus Manske wrote:
> > On Fri, Oct 24, 2008 at 5:59 PM, Brion Vibber
> <[EMAIL PROTECTED]> wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Johannes Beigel wrote:
> >>> Is there a way (or a plan to implement one) to
> retrieve the list of
> >>> unique contributors for a given article (from
> a given revision down to
> >>> the first one)? Ideally this would accept
> parameters for the mentioned
> >>> filtering. I guess inside of MediaWiki code
> this can be handled very
> >>> efficiently (using appropriate database
> queries) and would eliminate
> >>> the need to transfer lots of redundant data
> over the socket.
> >> Given that this could require filtering through
> hundreds of thousands of
> >> unique revisions for a single request, I don't
> think we currently have a
> >> good plan for that. :)
> >
> > I just ran a DISTINCT mysql query for all non-IP
> editors of
> > [[en:George W. Bush]] on the toolserver, and that took
> 3 seconds.
> > There are 41790 revisions.
>
> Indeed, it's not as bad as I was afraid. I'm still
> a little leery that
> the EXPLAIN lists "Using temporary" though. :P
>
> > Considering that this would be a worst case article,
> and that it ran
> > on the overtaxed toolserver, it does seem possible.
> Maybe if we'd have
> > one MySQL slave / Apache dedicated for this task?
>
> Probably fine to pull from the same slaves already
> dedicated for
> contributions queries (relevant indexes are already pulled
> into memory).
>
> Figuring out how to get something other than a raw list of
> thousands of
> editors for a "nice" author list remains a harder
> task. :)
wouldn't that be a snap using the group_by function? sorry, I don't know the
database structure, but generically:
SELECT contributors, COUNT(*) FROM database GROUP BY contributors
would return a list of all contributors and the number of contributions they've
made; it could be tweaked to return only those contributors who've made over X
contributions. of course, I've only worked on small databases, so I have no
idea what the overhead on this would be...
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api