I wanted to revive this thread: can someone please help me find a way to boost that query?
On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <[email protected]> wrote: > Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that query > every few minutes until it timed out, and the number rows it was querying > increased drastically. It very quickly reached 2399220000, with the last > value I go before timeout being 3693530000. That is just way too many. > > T139090 could be the cause; it went into effect the day after last > successful query, and it does affect indexes used by the query. > > One thing that can obviously make my query faster is to force it to run > the joins in a particular order. I tried to enforce it through creating > temporary tables in memory, but it failed: https://quarry.wmflabs.org/ > query/12719 > > Another approach would be to create indexes. However, my permissions don't > allow me to see what indexes currently exist (is there a way around that?) > so I cannot decide what new indexes can be added. > > On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo <[email protected]> > wrote: > >> So I cannot give you very specific advise, but based on the EXPLAIN, it >> seems that you may be trying to read too many rows, up to >> 100K * 300K * 300K (10 000 M) and then sort them, which is probably going >> to be very slow. >> >> If they used to work in the past there could be 2 probable explanations: >> a lot of rows have been inserted recently on one or several of the tables >> (e.g. a new template or category with many members) or the indexes have >> changed. Note that redoing the queries is something that we have to do >> constantly in production because rows quantities change. I also can think >> of this change that happened recently on production, but cannot say for >> sure if it is related or could affect you negatively: >> >> https://phabricator.wikimedia.org/T139090 >> >> Maybe that affects your query and it is as easy to fix as reordering your >> columns/changing slightly the filters or its order. If someone has a >> suggestion to make it faster, that requires a change to labs, remember that >> labs replicas are not static, and new indexes can be added if needed by the >> community, and I will gladly apply them myself. >> >> On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <[email protected]> wrote: >> >>> With two connections to the DB via Terminal, it is possible! Here is >>> what I got; it matches what you sent above too: >>> >>> >>> >>> MariaDB [fawiki_p]> show processlist; >>> +---------+--------+-------------------+----------+--------- >>> +------+----------------------------+----------------------- >>> ------------------------------------------------------------ >>> -------------------+----------+ >>> | Id | User | Host | db | Command | Time | >>> State | Info >>> >>> | Progress | >>> +---------+--------+-------------------+----------+--------- >>> +------+----------------------------+----------------------- >>> ------------------------------------------------------------ >>> -------------------+----------+ >>> | 3556437 | s51403 | 10.68.23.58:53391 | fawiki_p | Query | 19 | >>> Queried about 5450000 rows | select page_title, count(ll_lang) from page >>> join category on page_title = cat_title left join catego | 0.000 | >>> | 3556865 | s51403 | 10.68.23.58:54083 | fawiki_p | Query | 0 | >>> init | show processlist >>> | >>> 0.000 | >>> +---------+--------+-------------------+----------+--------- >>> +------+----------------------------+----------------------- >>> ------------------------------------------------------------ >>> -------------------+----------+ >>> 2 rows in set (0.00 sec) >>> >>> MariaDB [fawiki_p]> show explain for 3556437; >>> +------+-------------+---------------+--------+------------- >>> ---------------------------+-----------------------------+-- >>> -------+------------------------+--------+------------------ >>> ----------------------------+ >>> | id | select_type | table | type | >>> possible_keys | key | >>> key_len | ref | rows | Extra >>> | >>> +------+-------------+---------------+--------+------------- >>> ---------------------------+-----------------------------+-- >>> -------+------------------------+--------+------------------ >>> ----------------------------+ >>> | 1 | SIMPLE | page | ref | >>> name_title,page_redirect_namespace_len | page_redirect_namespace_len | >>> 5 | const,const | 364177 | Using where; Using temporary; >>> Using filesort | >>> | 1 | SIMPLE | category | eq_ref | >>> cat_title | cat_title | >>> 257 | fawiki.page.page_title | 1 | Using >>> index | >>> | 1 | SIMPLE | categorylinks | ref | >>> cl_timestamp,cl_sortkey | cl_timestamp | >>> 257 | fawiki.page.page_title | 12 | Using where; Using >>> index | >>> | 1 | SIMPLE | templatelinks | ref | >>> tl_from,tl_namespace | tl_namespace | >>> 4 | const | 390622 | Using where; Using >>> index | >>> | 1 | SIMPLE | langlinks | ref | >>> ll_from | ll_from | >>> 4 | fawiki.page.page_id | 104911 | Using >>> index | >>> +------+-------------+---------------+--------+------------- >>> ---------------------------+-----------------------------+-- >>> -------+------------------------+--------+------------------ >>> ----------------------------+ >>> >>> >>> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <[email protected]> wrote: >>> >>>> Oh, I misread your comment. >>>> >>>> I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it to >>>> work via terminal access to Labs. I wonder though if it works for a query >>>> that fails to complete. >>>> >>>> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <[email protected]> >>>> wrote: >>>> >>>>> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different >>>>> command, check the link I provided and report if that doesn't work for >>>>> you. >>>>> >>>>> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <[email protected]> wrote: >>>>> >>>>>> See https://quarry.wmflabs.org/query/12718 and the error message >>>>>> returned. >>>>>> >>>>>> EXPLAIN only works if you have access to the underlying tables. I >>>>>> only have access to the VIEWs built on top of actual wiki tables (for >>>>>> good >>>>>> reason, same applies to most of us) so I get an error when I try EXPLAIN. >>>>>> >>>>>> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Explain (you should be able to run SHOW EXPLAIN on your own queries: >>>>>>> http://s.petrunia.net/blog/?p=89): >>>>>>> >>>>>>> EXPLAIN select page_title, count(ll_lang) from page join category on >>>>>>> page_title = cat_title left join categorylinks on page_title = cl_to >>>>>>> left >>>>>>> join templatelinks on tl_from = page_id and tl_namespace = 10 and >>>>>>> tl_title >>>>>>> in ( 'رده_خالی' , 'رده_بهتر', 'رده_ابهامزدایی', 'رده_ردیابیکردن' ) >>>>>>> left >>>>>>> join langlinks on page_id = ll_from where page_namespace = 14 and >>>>>>> page_is_redirect = 0 and cl_to is null and tl_title is null group by >>>>>>> page_title order by 2, 1 limit 5000\G >>>>>>> *************************** 1. row *************************** >>>>>>> id: 1 >>>>>>> select_type: SIMPLE >>>>>>> table: page >>>>>>> type: ref >>>>>>> possible_keys: name_title,page_redirect_namespace_len >>>>>>> key: page_redirect_namespace_len >>>>>>> key_len: 5 >>>>>>> ref: const,const >>>>>>> rows: 364273 >>>>>>> Extra: Using where; Using temporary; Using filesort >>>>>>> *************************** 2. row *************************** >>>>>>> id: 1 >>>>>>> select_type: SIMPLE >>>>>>> table: category >>>>>>> type: eq_ref >>>>>>> possible_keys: cat_title >>>>>>> key: cat_title >>>>>>> key_len: 257 >>>>>>> ref: fawiki.page.page_title >>>>>>> rows: 1 >>>>>>> Extra: Using index >>>>>>> *************************** 3. row *************************** >>>>>>> id: 1 >>>>>>> select_type: SIMPLE >>>>>>> table: categorylinks >>>>>>> type: ref >>>>>>> possible_keys: cl_timestamp,cl_sortkey >>>>>>> key: cl_timestamp >>>>>>> key_len: 257 >>>>>>> ref: fawiki.page.page_title >>>>>>> rows: 12 >>>>>>> Extra: Using where; Using index >>>>>>> *************************** 4. row *************************** >>>>>>> id: 1 >>>>>>> select_type: SIMPLE >>>>>>> table: templatelinks >>>>>>> type: ref >>>>>>> possible_keys: tl_from,tl_namespace >>>>>>> key: tl_namespace >>>>>>> key_len: 4 >>>>>>> ref: const >>>>>>> rows: 390610 >>>>>>> Extra: Using where; Using index >>>>>>> *************************** 5. row *************************** >>>>>>> id: 1 >>>>>>> select_type: SIMPLE >>>>>>> table: langlinks >>>>>>> type: ref >>>>>>> possible_keys: ll_from >>>>>>> key: ll_from >>>>>>> key_len: 4 >>>>>>> ref: fawiki.page.page_id >>>>>>> rows: 104910 >>>>>>> Extra: Using index >>>>>>> >>>>>>> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a query [1] which I run weekly to identify empty categories; >>>>>>>> those that don't have interwiki links and stay empty for a while are >>>>>>>> then >>>>>>>> deleted by a sysop. >>>>>>>> >>>>>>>> The query ran just fine every week until Sep 2nd. [2] Since then, >>>>>>>> the query times out and the page doesn't get updated. >>>>>>>> >>>>>>>> The query is no that complex (a SELECT with five JOINs) and used to >>>>>>>> finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I >>>>>>>> cannot tell what is slowing it down. Can someone kindly take a look and >>>>>>>> advise why this suddenly stopped working? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Huji >>>>>>>> >>>>>>>> [1] https://quarry.wmflabs.org/query/3760 >>>>>>>> [2] https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A >>>>>>>> 9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7% >>>>>>>> D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B >>>>>>>> 1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84% >>>>>>>> DB%8C&action=history&uselang=en >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Labs-l mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jaime Crespo >>>>>>> <http://wikimedia.org> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Labs-l mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Labs-l mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jaime Crespo >>>>> <http://wikimedia.org> >>>>> >>>>> _______________________________________________ >>>>> Labs-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Labs-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>> >>> >> >> >> -- >> Jaime Crespo >> <http://wikimedia.org> >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> >
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
