I wanted to revive this thread: can someone please help me find a way
to boost that query?
On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <[email protected]
<mailto:[email protected]>> wrote:
Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that
query every few minutes until it timed out, and the number rows it
was querying increased drastically. It very quickly reached
2399220000 <tel:2399220000>, with the last value I go before
timeout being 3693530000. That is just way too many.
T139090 could be the cause; it went into effect the day after last
successful query, and it does affect indexes used by the query.
One thing that can obviously make my query faster is to force it
to run the joins in a particular order. I tried to enforce it
through creating temporary tables in memory, but it failed:
https://quarry.wmflabs.org/query/12719
<https://quarry.wmflabs.org/query/12719>
Another approach would be to create indexes. However, my
permissions don't allow me to see what indexes currently exist (is
there a way around that?) so I cannot decide what new indexes can
be added.
On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo
<[email protected] <mailto:[email protected]>> wrote:
So I cannot give you very specific advise, but based on the
EXPLAIN, it seems that you may be trying to read too many
rows, up to
100K * 300K * 300K (10 000 M) and then sort them, which is
probably going to be very slow.
If they used to work in the past there could be 2 probable
explanations: a lot of rows have been inserted recently on one
or several of the tables (e.g. a new template or category with
many members) or the indexes have changed. Note that redoing
the queries is something that we have to do constantly in
production because rows quantities change. I also can think of
this change that happened recently on production, but cannot
say for sure if it is related or could affect you negatively:
https://phabricator.wikimedia.org/T139090
<https://phabricator.wikimedia.org/T139090>
Maybe that affects your query and it is as easy to fix as
reordering your columns/changing slightly the filters or its
order. If someone has a suggestion to make it faster, that
requires a change to labs, remember that labs replicas are not
static, and new indexes can be added if needed by the
community, and I will gladly apply them myself.
On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <[email protected]
<mailto:[email protected]>> wrote:
With two connections to the DB via Terminal, it is
possible! Here is what I got; it matches what you sent
above too:
MariaDB [fawiki_p]> show processlist;
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db |
Command | Time | State | Info |
Progress |
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
| 3556437 | s51403 | 10.68.23.58:53391
<http://10.68.23.58:53391> | fawiki_p | Query | 19 |
Queried about 5450000 rows | select page_title,
count(ll_lang) from page join category on page_title =
cat_title left join catego | 0.000 |
| 3556865 | s51403 | 10.68.23.58:54083
<http://10.68.23.58:54083> | fawiki_p | Query | 0 |
init | show processlist | 0.000 |
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
2 rows in set (0.00 sec)
MariaDB [fawiki_p]> show explain for 3556437;
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type |
possible_keys | key | key_len |
ref | rows | Extra |
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | page | ref |
name_title,page_redirect_namespace_len |
page_redirect_namespace_len | 5 | const,const |
364177 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | category | eq_ref | cat_title
| cat_title | 257 |
fawiki.page.page_title | 1 | Using index |
| 1 | SIMPLE | categorylinks | ref |
cl_timestamp,cl_sortkey | cl_timestamp |
257 | fawiki.page.page_title | 12 | Using where;
Using index |
| 1 | SIMPLE | templatelinks | ref |
tl_from,tl_namespace | tl_namespace |
4 | const | 390622 | Using where; Using
index |
| 1 | SIMPLE | langlinks | ref | ll_from |
ll_from | 4 |
fawiki.page.page_id | 104911 | Using index |
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee
<[email protected] <mailto:[email protected]>> wrote:
Oh, I misread your comment.
I doubt SHOW EXPLAIN would work via Quarry, so I am
trying to get it to work via terminal access to Labs.
I wonder though if it works for a query that fails to
complete.
On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo
<[email protected] <mailto:[email protected]>>
wrote:
Huji, note I suggested SHOW EXPLAIN, not EXPLAIN;
it is a different command, check the link I
provided and report if that doesn't work for you.
On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee
<[email protected] <mailto:[email protected]>>
wrote:
See https://quarry.wmflabs.org/query/12718
<https://quarry.wmflabs.org/query/12718> and
the error message returned.
EXPLAIN only works if you have access to the
underlying tables. I only have access to the
VIEWs built on top of actual wiki tables (for
good reason, same applies to most of us) so I
get an error when I try EXPLAIN.
On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo
<[email protected]
<mailto:[email protected]>> wrote:
Explain (you should be able to run SHOW
EXPLAIN on your own queries:
http://s.petrunia.net/blog/?p=89
<http://s.petrunia.net/blog/?p=89>):
EXPLAIN select page_title, count(ll_lang)
from page join category on page_title =
cat_title left join categorylinks on
page_title = cl_to left join templatelinks
on tl_from = page_id and tl_namespace = 10
and tl_title in ( 'رده_خالی' ,
'رده_بهتر', 'رده_ابهامزدایی',
'رده_ردیابیکردن' ) left join langlinks on
page_id = ll_from where page_namespace =
14 and page_is_redirect = 0 and cl_to is
null and tl_title is null group by
page_title order by 2, 1 limit 5000\G
*************************** 1. row
***************************
id: 1
select_type: SIMPLE
table: page
type: ref
possible_keys:
name_title,page_redirect_namespace_len
key: page_redirect_namespace_len
key_len: 5
ref: const,const
rows: 364273
Extra: Using where; Using
temporary; Using filesort
*************************** 2. row
***************************
id: 1
select_type: SIMPLE
table: category
type: eq_ref
possible_keys: cat_title
key: cat_title
key_len: 257
ref: fawiki.page.page_title
rows: 1
Extra: Using index
*************************** 3. row
***************************
id: 1
select_type: SIMPLE
table: categorylinks
type: ref
possible_keys: cl_timestamp,cl_sortkey
key: cl_timestamp
key_len: 257
ref: fawiki.page.page_title
rows: 12
Extra: Using where; Using index
*************************** 4. row
***************************
id: 1
select_type: SIMPLE
table: templatelinks
type: ref
possible_keys: tl_from,tl_namespace
key: tl_namespace
key_len: 4
ref: const
rows: 390610
Extra: Using where; Using index
*************************** 5. row
***************************
id: 1
select_type: SIMPLE
table: langlinks
type: ref
possible_keys: ll_from
key: ll_from
key_len: 4
ref: fawiki.page.page_id
rows: 104910
Extra: Using index
On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee
<[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I have a query [1] which I run weekly
to identify empty categories; those
that don't have interwiki links and
stay empty for a while are then
deleted by a sysop.
The query ran just fine every week
until Sep 2nd. [2] Since then, the
query times out and the page doesn't
get updated.
The query is no that complex (a SELECT
with five JOINs) and used to finish in
about 3-5 minutes. I don't have
EXPLAIN access on Labs so I cannot
tell what is slowing it down. Can
someone kindly take a look and advise
why this suddenly stopped working?
Thanks,
Huji
[1]
https://quarry.wmflabs.org/query/3760
<https://quarry.wmflabs.org/query/3760>
[2]
https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en
<https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en>
_______________________________________________
Labs-l mailing list
[email protected]
<mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Labs-l mailing list
[email protected]
<mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
_______________________________________________
Labs-l mailing list
[email protected]
<mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Labs-l mailing list
[email protected]
<mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
_______________________________________________
Labs-l mailing list
[email protected] <mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Labs-l mailing list
[email protected] <mailto:[email protected]>
https://lists.wikimedia.org/mailman/listinfo/labs-l
<https://lists.wikimedia.org/mailman/listinfo/labs-l>
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l