The query is using the tl_namespace index (slow) instead of the tl_from index (fast). Since you can't force an index in a view, I tried removing the 'and tl_namespace = 10' qualifier and it ran in 40 seconds. It used the tl_from index. Since templatelinks usually point to the 10 namespace anyway, there shouldn't be to many, if any, false positives by leaving the qualifier out.

On 09/28/2016 03:03 PM, Huji Lee wrote:
I wanted to revive this thread: can someone please help me find a way to boost that query?

On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <[email protected] <mailto:[email protected]>> wrote:

    Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that
    query every few minutes until it timed out, and the number rows it
    was querying increased drastically. It very quickly reached
    2399220000 <tel:2399220000>, with the last value I go before
    timeout being 3693530000. That is just way too many.

    T139090 could be the cause; it went into effect the day after last
    successful query, and it does affect indexes used by the query.

    One thing that can obviously make my query faster is to force it
    to run the joins in a particular order. I tried to enforce it
    through creating temporary tables in memory, but it failed:
    https://quarry.wmflabs.org/query/12719
    <https://quarry.wmflabs.org/query/12719>

    Another approach would be to create indexes. However, my
    permissions don't allow me to see what indexes currently exist (is
    there a way around that?) so I cannot decide what new indexes can
    be added.

    On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo
    <[email protected] <mailto:[email protected]>> wrote:

        So I cannot give you very specific advise, but based on the
        EXPLAIN, it seems that you may be trying to read too many
        rows, up to
        100K * 300K * 300K (10 000 M) and then sort them, which is
        probably going to be very slow.

        If they used to work in the past there could be 2 probable
        explanations: a lot of rows have been inserted recently on one
        or several of the tables (e.g. a new template or category with
        many members) or the indexes have changed. Note that redoing
        the queries is something that we have to do constantly in
        production because rows quantities change. I also can think of
        this change that happened recently on production, but cannot
        say for sure if it is related or could affect you negatively:

        https://phabricator.wikimedia.org/T139090
        <https://phabricator.wikimedia.org/T139090>

        Maybe that affects your query and it is as easy to fix as
        reordering your columns/changing slightly the filters or its
        order. If someone has a suggestion to make it faster, that
        requires a change to labs, remember that labs replicas are not
        static, and new indexes can be added if needed by the
        community, and I will gladly apply them myself.

        On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <[email protected]
        <mailto:[email protected]>> wrote:

            With two connections to the DB via Terminal, it is
            possible! Here is what I got; it matches what you sent
            above too:



            MariaDB [fawiki_p]> show processlist;
            
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
            | Id      | User   | Host              | db       |
            Command | Time | State                      | Info |
            Progress |
            
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
            | 3556437 | s51403 | 10.68.23.58:53391
            <http://10.68.23.58:53391> | fawiki_p | Query   |   19 |
            Queried about 5450000 rows | select page_title,
            count(ll_lang) from page join category on page_title =
            cat_title left join catego | 0.000 |
            | 3556865 | s51403 | 10.68.23.58:54083
            <http://10.68.23.58:54083> | fawiki_p | Query   |    0 |
            init                       | show processlist |    0.000 |
            
+---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
            2 rows in set (0.00 sec)

            MariaDB [fawiki_p]> show explain for 3556437;
            
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
            | id   | select_type | table         | type   |
            possible_keys | key                         | key_len |
            ref | rows   | Extra |
            
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
            |    1 | SIMPLE      | page          | ref    |
            name_title,page_redirect_namespace_len |
            page_redirect_namespace_len | 5       | const,const |
            364177 | Using where; Using temporary; Using filesort |
            |    1 | SIMPLE      | category      | eq_ref | cat_title
            | cat_title                   | 257     |
            fawiki.page.page_title |      1 | Using index |
            |    1 | SIMPLE      | categorylinks | ref    |
            cl_timestamp,cl_sortkey | cl_timestamp                |
            257     | fawiki.page.page_title |     12 | Using where;
            Using index                     |
            |    1 | SIMPLE      | templatelinks | ref    |
            tl_from,tl_namespace | tl_namespace                |
            4       | const | 390622 | Using where; Using
            index                     |
            |    1 | SIMPLE      | langlinks     | ref    | ll_from |
            ll_from                     | 4       |
            fawiki.page.page_id | 104911 | Using index |
            
+------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+


            On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee
            <[email protected] <mailto:[email protected]>> wrote:

                Oh, I misread your comment.

                I doubt SHOW EXPLAIN would work via Quarry, so I am
                trying to get it to work via terminal access to Labs.
                I wonder though if it works for a query that fails to
                complete.

                On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo
                <[email protected] <mailto:[email protected]>>
                wrote:

                    Huji, note I suggested SHOW EXPLAIN, not EXPLAIN;
                    it is a different command, check the link I
                    provided and report if that doesn't work for you.

                    On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee
                    <[email protected] <mailto:[email protected]>>
                    wrote:

                        See https://quarry.wmflabs.org/query/12718
                        <https://quarry.wmflabs.org/query/12718> and
                        the error message returned.

                        EXPLAIN only works if you have access to the
                        underlying tables. I only have access to the
                        VIEWs built on top of actual wiki tables (for
                        good reason, same applies to most of us) so I
                        get an error when I try EXPLAIN.

                        On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo
                        <[email protected]
                        <mailto:[email protected]>> wrote:

                            Explain (you should be able to run SHOW
                            EXPLAIN on your own queries:
                            http://s.petrunia.net/blog/?p=89
                            <http://s.petrunia.net/blog/?p=89>):

                            EXPLAIN select page_title, count(ll_lang)
                            from page join category on page_title =
                            cat_title left join categorylinks on
                            page_title = cl_to left join templatelinks
                            on tl_from = page_id and tl_namespace = 10
                            and tl_title in (   'رده_خالی' ,
                            'رده_بهتر', 'رده_ابهام‌زدایی',
                            'رده_ردیابی‌کردن' ) left join langlinks on
                            page_id = ll_from where page_namespace =
                            14 and page_is_redirect = 0 and cl_to is
                            null and tl_title is null group by
                            page_title order by 2, 1 limit 5000\G
                            *************************** 1. row
                            ***************************
                                       id: 1
                              select_type: SIMPLE
                                    table: page
                                     type: ref
                            possible_keys:
                            name_title,page_redirect_namespace_len
                                      key: page_redirect_namespace_len
                                  key_len: 5
                                      ref: const,const
                                     rows: 364273
                                    Extra: Using where; Using
                            temporary; Using filesort
                            *************************** 2. row
                            ***************************
                                       id: 1
                              select_type: SIMPLE
                                    table: category
                                     type: eq_ref
                            possible_keys: cat_title
                                      key: cat_title
                                  key_len: 257
                                      ref: fawiki.page.page_title
                                     rows: 1
                                    Extra: Using index
                            *************************** 3. row
                            ***************************
                                       id: 1
                              select_type: SIMPLE
                                    table: categorylinks
                                     type: ref
                            possible_keys: cl_timestamp,cl_sortkey
                                      key: cl_timestamp
                                  key_len: 257
                                      ref: fawiki.page.page_title
                                     rows: 12
                                    Extra: Using where; Using index
                            *************************** 4. row
                            ***************************
                                       id: 1
                              select_type: SIMPLE
                                    table: templatelinks
                                     type: ref
                            possible_keys: tl_from,tl_namespace
                                      key: tl_namespace
                                  key_len: 4
                                      ref: const
                                     rows: 390610
                                    Extra: Using where; Using index
                            *************************** 5. row
                            ***************************
                                       id: 1
                              select_type: SIMPLE
                                    table: langlinks
                                     type: ref
                            possible_keys: ll_from
                                      key: ll_from
                                  key_len: 4
                                      ref: fawiki.page.page_id
                                     rows: 104910
                                    Extra: Using index

                            On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee
                            <[email protected]
                            <mailto:[email protected]>> wrote:

                                Hi all,

                                I have a query [1] which I run weekly
                                to identify empty categories; those
                                that don't have interwiki links and
                                stay empty for a while are then
                                deleted by a sysop.

                                The query ran just fine every week
                                until Sep 2nd. [2] Since then, the
                                query times out and the page doesn't
                                get updated.

                                The query is no that complex (a SELECT
                                with five JOINs) and used to finish in
                                about 3-5 minutes. I don't have
                                EXPLAIN access on Labs so I cannot
                                tell what is slowing it down. Can
                                someone kindly take a look and advise
                                why this suddenly stopped working?

                                Thanks,

                                Huji

                                [1]
                                https://quarry.wmflabs.org/query/3760
                                <https://quarry.wmflabs.org/query/3760>
                                [2]
                                
https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en
                                
<https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en>

                                _______________________________________________
                                Labs-l mailing list
                                [email protected]
                                <mailto:[email protected]>
                                
https://lists.wikimedia.org/mailman/listinfo/labs-l
                                
<https://lists.wikimedia.org/mailman/listinfo/labs-l>




-- Jaime Crespo
                            <http://wikimedia.org>

                            _______________________________________________
                            Labs-l mailing list
                            [email protected]
                            <mailto:[email protected]>
                            https://lists.wikimedia.org/mailman/listinfo/labs-l
                            
<https://lists.wikimedia.org/mailman/listinfo/labs-l>



                        _______________________________________________
                        Labs-l mailing list
                        [email protected]
                        <mailto:[email protected]>
                        https://lists.wikimedia.org/mailman/listinfo/labs-l
                        <https://lists.wikimedia.org/mailman/listinfo/labs-l>




-- Jaime Crespo
                    <http://wikimedia.org>

                    _______________________________________________
                    Labs-l mailing list
                    [email protected]
                    <mailto:[email protected]>
                    https://lists.wikimedia.org/mailman/listinfo/labs-l
                    <https://lists.wikimedia.org/mailman/listinfo/labs-l>




            _______________________________________________
            Labs-l mailing list
            [email protected] <mailto:[email protected]>
            https://lists.wikimedia.org/mailman/listinfo/labs-l
            <https://lists.wikimedia.org/mailman/listinfo/labs-l>




-- Jaime Crespo
        <http://wikimedia.org>

        _______________________________________________
        Labs-l mailing list
        [email protected] <mailto:[email protected]>
        https://lists.wikimedia.org/mailman/listinfo/labs-l
        <https://lists.wikimedia.org/mailman/listinfo/labs-l>





_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l


_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to