Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query

Nikolas Everett Sat, 06 Sep 2014 14:35:54 -0700

You might try using the query to the full text search backend. I'm on
mobile or I would whip up an example. I believe the syntax is
incategory:"you_category ". You might need to use CirrusSearch
(srbackend=CirrusSearch) if the category is added by a template.


I don't know if it's be faster but you can try.
On Sep 6, 2014 5:30 PM, "Nikolas Everett" <[email protected]> wrote:

> I don't know the MySQL syntax super well but that query implied that it
> will always use the time index and then iterate in time order looking for a
> link to the category you want. Unless that category is an appreciable
> fraction of the total links I don't imagine that is a good plan.
> On Sep 6, 2014 3:58 PM, "Brad Jorsch (Anomie)" <[email protected]>
> wrote:
>
>> The database query for that is simple enough:
>>
>> SELECT /* ApiQueryCategoryMembers::run Anomie */
>> cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp  FROM
>> `page`,`categorylinks`  FORCE INDEX (cl_timestamp) WHERE cl_to =
>> 'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id)  ORDER BY
>> cl_timestamp,cl_from LIMIT 501;
>>
>> And the PHP code doesn't do anything complicated either. Maybe Sean can
>> give us more insight if there's some subtle database thing going on here.
>>
>> Note, though, that you're not getting anything at all random here; you're
>> always getting the files that have been in the category longest first.
>>
>>
>>
>> On Fri, Sep 5, 2014 at 11:52 PM, This, that and the other <
>> [email protected]> wrote:
>>
>>> A tool I have written, For the Common Good [1], uses the following type
>>> of query to fetch a list of "random" files that users may like to transfer
>>> to Commons. The category name may differ but the structure is the same:
>>>
>>> https://en.wikipedia.org/w/api.php?format=xml&cmnamespace=6&cmtitle=
>>> Category%3ACopy%20to%20Wikimedia%20Commons%20(bot-
>>> assessed)&action=query&list=categorymembers&cmsort=
>>> timestamp&cmprop=title&cmlimit=500
>>>
>>> In 2011 when I was first writing FtCG, this query ran at an acceptable
>>> speed. Recently, though, it has become extremely slow, to the point where
>>> timeouts are now a regular occurrence. It sometimes takes 4 or 5 tries (and
>>> several minutes) before results are returned. From then on, however, it
>>> works quickly. If you run this exact query now, there's a good chance it
>>> will work quickly because others have been running the query before you.
>>>
>>> The cause seems to be the "cmsort=timestamp" portion of the request. If
>>> this is removed, it works essentially instantaneously. However, I don't
>>> really want the files in alphabetical order, as it doesn't seem very
>>> "random".
>>>
>>> Four questions:
>>> 1. Why does this query take so long?
>>> 2. Can anything be done on the server side to make it faster?
>>> 3. Why does it take so much longer now than it did in 2011?
>>> 4. Is there a better way to fetch a random cross-section of files in a
>>> particular category?
>>>
>>> TTO
>>>
>>> [1] https://en.wikipedia.org/wiki/User:This,_that_and_the_other/
>>> For_the_Common_Good
>>>
>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>
>>
>>
>>
>> --
>> Brad Jorsch (Anomie)
>> Software Engineer
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>>

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query

Reply via email to