--- On Wed, 10/29/08, Brion Vibber <[EMAIL PROTECTED]> wrote: > From: Brion Vibber <[EMAIL PROTECTED]> > Subject: Re: [Mediawiki-api] List of all authors via API > To: "MediaWiki API announcements & discussion" > <mediawiki-api@lists.wikimedia.org> > Date: Wednesday, October 29, 2008, 10:24 AM > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Magnus Manske wrote: > > On Fri, Oct 24, 2008 at 5:59 PM, Brion Vibber > <[EMAIL PROTECTED]> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Johannes Beigel wrote: > >>> Is there a way (or a plan to implement one) to > retrieve the list of > >>> unique contributors for a given article (from > a given revision down to > >>> the first one)? Ideally this would accept > parameters for the mentioned > >>> filtering. I guess inside of MediaWiki code > this can be handled very > >>> efficiently (using appropriate database > queries) and would eliminate > >>> the need to transfer lots of redundant data > over the socket. > >> Given that this could require filtering through > hundreds of thousands of > >> unique revisions for a single request, I don't > think we currently have a > >> good plan for that. :) > > > > I just ran a DISTINCT mysql query for all non-IP > editors of > > [[en:George W. Bush]] on the toolserver, and that took > 3 seconds. > > There are 41790 revisions. > > Indeed, it's not as bad as I was afraid. I'm still > a little leery that > the EXPLAIN lists "Using temporary" though. :P > > > Considering that this would be a worst case article, > and that it ran > > on the overtaxed toolserver, it does seem possible. > Maybe if we'd have > > one MySQL slave / Apache dedicated for this task? > > Probably fine to pull from the same slaves already > dedicated for > contributions queries (relevant indexes are already pulled > into memory). > > Figuring out how to get something other than a raw list of > thousands of > editors for a "nice" author list remains a harder > task. :)
wouldn't that be a snap using the group_by function? sorry, I don't know the database structure, but generically: SELECT contributors, COUNT(*) FROM database GROUP BY contributors would return a list of all contributors and the number of contributions they've made; it could be tweaked to return only those contributors who've made over X contributions. of course, I've only worked on small databases, so I have no idea what the overhead on this would be... _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api