I would recommend starting with the appropriate Special page: [[Special:MostLinkedCategories]] That generates the 500 categories with most members and is refreshed every few days.
If that wasn't enough, I would do: SELECT max(cat_id) FROM category; to get the last cat_id, and then, in small, manageable transactions, run: SELECT page_title, cat_pages, cat_subcats, page_id FROM category JOIN page ON category.cat_title = page.page_title AND page.page_namespace = 14 LEFT JOIN langlinks ON page_id = ll_from and ll_lang = 'fa' WHERE cat_id BETWEEN ? and ? and cat_pages > 500 and ll_from is null; where the first 2 "?" are small ranges to cover, in small increments (e.g. 10000 records each time), between 1 and max(cat_id). Change "cat_pages > 500" to whatever fits you. Double check the where/on s, not 100% sure they are right, but it should be close to what you want. On Wed, Mar 9, 2016 at 4:38 AM, John <[email protected]> wrote: > I don't have it handy but category membership counts are stored in the db > use that instead of a count() > > > On Tuesday, March 8, 2016, Huji Lee <[email protected]> wrote: >> >> I am trying to find a list of those categories on EN WP that are highly >> used (more than 100 subpages or subcats) but don't have a counterpart in FA >> WP. The query I used is shown below and also on >> https://quarry.wmflabs.org/query/7943 and is extremely slow. >> >> Any thoughts on how to make it more efficient? >> >> select page_title, ll_title, count(cl_to) as CNT >> from page >> left join langlinks >> on ll_from = page_id >> and ll_lang = 'fa' >> join categorylinks >> on page_title = cl_to >> where page_namespace = 14 >> and ll_from is null >> group by cl_to, ll_title >> having CNT > 100 >> order by CNT desc; >> >> > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > -- Jaime Crespo <http://wikimedia.org> _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
