A tool I have written, For the Common Good [1], uses the following type of query to
fetch a list of "random" files that users may like to transfer to Commons. The
category name may differ but the structure is the same:
https://en.wikipedia.org/w/api.php?format=xml&cmnamespace=6&cmtitle=Category%3ACopy%20to%20Wikimedia%20Commons%20(bot-assessed)&action=query&list=categorymembers&cmsort=timestamp&cmprop=title&cmlimit=500
In 2011 when I was first writing FtCG, this query ran at an acceptable speed.
Recently, though, it has become extremely slow, to the point where timeouts are now
a regular occurrence. It sometimes takes 4 or 5 tries (and several minutes) before
results are returned. From then on, however, it works quickly. If you run this exact
query now, there's a good chance it will work quickly because others have been
running the query before you.
The cause seems to be the "cmsort=timestamp" portion of the request. If this is
removed, it works essentially instantaneously. However, I don't really want the
files in alphabetical order, as it doesn't seem very "random".
Four questions:
1. Why does this query take so long?
2. Can anything be done on the server side to make it faster?
3. Why does it take so much longer now than it did in 2011?
4. Is there a better way to fetch a random cross-section of files in a particular
category?
TTO
[1] https://en.wikipedia.org/wiki/User:This,_that_and_the_other/For_the_Common_Good
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api