On second look, it would be ideal for me to get the counts of "articles" (not pages) in the category, something that cannot be achieved used the *category* table. I am going back to my old query agian.. the slow one.
On Sat, Mar 12, 2016 at 11:52 AM, Huji Lee <[email protected]> wrote: > Thanks John. I avoided category table because it is not 100% consistent > with categorylinks, but they join you used takes care of that. This is an > excellent solution to my particular problem. > > On Thu, Mar 10, 2016 at 6:48 AM, John <[email protected]> wrote: > >> Take a look at: >> >> select * from category left join page on page_title = cat_title where >> page_namespace = 14 and cat_pages > 500 and 'fa' NOT IN (SELECT ll_lang >> FROM langlinks where ll_from = page_id) >> >> It works very quick and gives what you need. >> >> >> On Thu, Mar 10, 2016 at 4:09 AM, Jaime Crespo <[email protected]> >> wrote: >> >>> On Thu, Mar 10, 2016 at 9:24 AM, Purodha Blissenbach >>> <[email protected]> wrote: >>> > I do not know how replication works, but I did and do assume, it >>> > replicates indexes. If so, the online manual starting at >>> > https://www.mediawiki.org/wiki/Manual:Database_layout >>> > may help you. >>> >>> Correct. There are some cases where the indexing doesn't work (it >>> doesn't make sense to use certain indexes that are partially filtered >>> on the view). In those cases, new views are created, such as >>> "revision_userindex", etc. >>> >>> Maybe we can add a "columns" table to information_schema_p, or the >>> create tables to the "tables" table, where the original table >>> description is explained. File a ticket if that would be useful. >>> >>> If there are several users that perform common slow analytics, file a >>> ticket and we could, if technically possible, do them in advance so >>> they do not have to be done every single time. Another option is to >>> replicate to a column-based engine, but I am not sure if people would >>> know how to take most advantage of those (where aggregations are >>> cheap). >>> >>> > Shouldn't a simple join on indexed fields + a simple aggregation on an >>> indexed field be relatively cheap >>> >>> No, aggregation on mysql, on standard engines, is never cheap. Indexes >>> do not help with that, they still need to count every single row (so >>> you are creating a filter, of a sort, of a full tablescan of one of >>> the largest tables per wiki). Why doing that when the count is already >>> aggregated? >>> >>> I recently delivered a session on query optimization for mysql and >>> mariadb, using wikimedia and osm as examples [warning, external site] >>> <https://github.com/jynus/query-optimization>, maybe that would be >>> helpful? >>> >>> -- >>> Jaime Crespo >>> <http://wikimedia.org> >>> >>> _______________________________________________ >>> Labs-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>> >> >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> >
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
