I don't know the MySQL syntax super well but that query implied that it will always use the time index and then iterate in time order looking for a link to the category you want. Unless that category is an appreciable fraction of the total links I don't imagine that is a good plan. On Sep 6, 2014 3:58 PM, "Brad Jorsch (Anomie)" <[email protected]> wrote:
> The database query for that is simple enough: > > SELECT /* ApiQueryCategoryMembers::run Anomie */ > cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp FROM > `page`,`categorylinks` FORCE INDEX (cl_timestamp) WHERE cl_to = > 'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id) ORDER BY > cl_timestamp,cl_from LIMIT 501; > > And the PHP code doesn't do anything complicated either. Maybe Sean can > give us more insight if there's some subtle database thing going on here. > > Note, though, that you're not getting anything at all random here; you're > always getting the files that have been in the category longest first. > > > > On Fri, Sep 5, 2014 at 11:52 PM, This, that and the other < > [email protected]> wrote: > >> A tool I have written, For the Common Good [1], uses the following type >> of query to fetch a list of "random" files that users may like to transfer >> to Commons. The category name may differ but the structure is the same: >> >> https://en.wikipedia.org/w/api.php?format=xml&cmnamespace=6&cmtitle= >> Category%3ACopy%20to%20Wikimedia%20Commons%20(bot- >> assessed)&action=query&list=categorymembers&cmsort= >> timestamp&cmprop=title&cmlimit=500 >> >> In 2011 when I was first writing FtCG, this query ran at an acceptable >> speed. Recently, though, it has become extremely slow, to the point where >> timeouts are now a regular occurrence. It sometimes takes 4 or 5 tries (and >> several minutes) before results are returned. From then on, however, it >> works quickly. If you run this exact query now, there's a good chance it >> will work quickly because others have been running the query before you. >> >> The cause seems to be the "cmsort=timestamp" portion of the request. If >> this is removed, it works essentially instantaneously. However, I don't >> really want the files in alphabetical order, as it doesn't seem very >> "random". >> >> Four questions: >> 1. Why does this query take so long? >> 2. Can anything be done on the server side to make it faster? >> 3. Why does it take so much longer now than it did in 2011? >> 4. Is there a better way to fetch a random cross-section of files in a >> particular category? >> >> TTO >> >> [1] https://en.wikipedia.org/wiki/User:This,_that_and_the_other/ >> For_the_Common_Good >> >> >> _______________________________________________ >> Mediawiki-api mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > > > > -- > Brad Jorsch (Anomie) > Software Engineer > Wikimedia Foundation > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >
_______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
