Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query

Nikolas Everett Sat, 06 Sep 2014 14:30:50 -0700

I don't know the MySQL syntax super well but that query implied that it
will always use the time index and then iterate in time order looking for a
link to the category you want. Unless that category is an appreciable
fraction of the total links I don't imagine that is a good plan.
On Sep 6, 2014 3:58 PM, "Brad Jorsch (Anomie)" <[email protected]>
wrote:


> The database query for that is simple enough:
>
> SELECT /* ApiQueryCategoryMembers::run Anomie */
> cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp  FROM
> `page`,`categorylinks`  FORCE INDEX (cl_timestamp) WHERE cl_to =
> 'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id)  ORDER BY
> cl_timestamp,cl_from LIMIT 501;
>
> And the PHP code doesn't do anything complicated either. Maybe Sean can
> give us more insight if there's some subtle database thing going on here.
>
> Note, though, that you're not getting anything at all random here; you're
> always getting the files that have been in the category longest first.
>
>
>
> On Fri, Sep 5, 2014 at 11:52 PM, This, that and the other <
> [email protected]> wrote:
>
>> A tool I have written, For the Common Good [1], uses the following type
>> of query to fetch a list of "random" files that users may like to transfer
>> to Commons. The category name may differ but the structure is the same:
>>
>> https://en.wikipedia.org/w/api.php?format=xml&cmnamespace=6&cmtitle=
>> Category%3ACopy%20to%20Wikimedia%20Commons%20(bot-
>> assessed)&action=query&list=categorymembers&cmsort=
>> timestamp&cmprop=title&cmlimit=500
>>
>> In 2011 when I was first writing FtCG, this query ran at an acceptable
>> speed. Recently, though, it has become extremely slow, to the point where
>> timeouts are now a regular occurrence. It sometimes takes 4 or 5 tries (and
>> several minutes) before results are returned. From then on, however, it
>> works quickly. If you run this exact query now, there's a good chance it
>> will work quickly because others have been running the query before you.
>>
>> The cause seems to be the "cmsort=timestamp" portion of the request. If
>> this is removed, it works essentially instantaneously. However, I don't
>> really want the files in alphabetical order, as it doesn't seem very
>> "random".
>>
>> Four questions:
>> 1. Why does this query take so long?
>> 2. Can anything be done on the server side to make it faster?
>> 3. Why does it take so much longer now than it did in 2011?
>> 4. Is there a better way to fetch a random cross-section of files in a
>> particular category?
>>
>> TTO
>>
>> [1] https://en.wikipedia.org/wiki/User:This,_that_and_the_other/
>> For_the_Common_Good
>>
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>
>
>
> --
> Brad Jorsch (Anomie)
> Software Engineer
> Wikimedia Foundation
>
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query

Reply via email to