It looks to me like the query attempts to count all logged-in non-bot page
creations in certain namespaces, grouped by user, and each user also has
their counts for the various different individual namespaces shown.

On Wed, 5 Jun 2019 at 09:53, Jaime Crespo <[email protected]> wrote:

> > Finally, any advice on how to make it more efficient is highly
> appreciated.
>
> I would be more concerned about the derived subquery, as that can
> create many executions of the inside query for every row of the
> outside one.
>
> If you could update the query with a comprehensive comment of what you
> are trying to achieve, that would help optimizing it, because some
> times transformation requires a logically different query but one that
> obtains equal or similar results.
>
> On Wed, Jun 5, 2019 at 2:21 AM Bryan Davis <[email protected]> wrote:
> >
> > On Tue, Jun 4, 2019 at 4:54 PM Huji Lee <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > I have a query that used to do a GROUP BY on rev_user and now I have
> migrated it to do so using rev_actor instead. The latest version looks like
> this [0] and its performance is not good [1] at all. In fact, I left it
> running for fawiki for several hours and it did not generate any results
> (it used to generate results in about 30-40 minutes). Also, the execution
> plan that is shown in the second link involves some tables like
> filearchive, image and oldimage that are not part of the query.
> > >
> > > Is this a side effect of the revision_actor_temp table?
> >
> > The EXPLAIN output including filearchive, image, oldimage, and other
> > apparently unrelated fields is caused by the view for the actor table.
> > This is basically the same problem as I described in
> > <https://lists.wikimedia.org/pipermail/cloud/2019-June/000683.html>
> > for the view of the comment table. The flag for suppression lives in
> > the related entity table, so the actor view runs correlated
> > sub-queries for each entity table that could indicate that a specific
> > row from the actor table has been suppressed.
> >
> > I created <https://phabricator.wikimedia.org/T224850> to look for a
> > solution for this same class of issue in the comment table. If we find
> > a fix there it seems likely that we could do the same thing for the
> > actor table, whatever that may end up being.
> >
> > > Also, what is that "key0" about, which is not associated with any
> tables?
> >
> > This is a key generated by the database on a temporary table that is
> > being used to answer your query. It looks like there is a little
> > display bug in the sql-optimizer tool. The table name that should be
> > shown is "<derived2>" which you can see in the HTML source of the
> > page. See <
> https://mariadb.com/kb/en/library/derived-table-with-key-optimization/>
> > for some details on this type of internal optimization.
> >
> > > Finally, any advice on how to make it more efficient is highly
> appreciated.
> >
> > The "Using filesort" notes in the plan are caused by the GROUP BYs. If
> > you can figure out how to get rid of the GROUP BY clauses things
> > should be faster, but to do that I think you might need to collect
> > information using multiple queries and some scripting language to
> > collect and correlate the data outside of the database.
> >
> > I would generally expect any query that tries to compute statistics
> > for all editors who have ever been active on a wiki will be slow. You
> > might try finding a good way to break the query into smaller batches.
> >
> > > [0]: https://pastebin.com/jWTzsiJY
> > > [1]:
> https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=SELECT%0D%0A++actor_name%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+0+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+article%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+10+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+tpl%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+12+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+helppage%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+14+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+cat%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+100+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+portal%2C%0D%0A++COUNT%28rev_first%29+tot%0D%0AFROM+revision+r%0D%0AJOIN+%28%0D%0A++select%0D%0A++++MIN%28rev_id%29+rev_first%2C%0D%0A++++rev_page%0D%0A++FROM+revision%0D%0A++GROUP+BY+rev_page%0D%0A%29+f%0D%0A++ON+r.rev_id+%3D+f.rev_first%0D%0AJOIN+page%0D%0A++ON+page_id+%3D+r.rev_page%0D%0AJOIN+actor%0D%0A++ON+rev_actor+%3D+actor_id%0D%0ALEFT+JOIN+user_groups%0D%0A++ON+actor_user+%3D+ug_user%0D%0A++AND+ug_group+%3D+%22bot%22%0D%0AWHERE%0D%0A++actor_user+%3C%3E+0%0D%0A++AND+ug_group+IS+NULL%0D%0A++AND+page_namespace+IN+%28%0D%0A++++0%2C%0D%0A++++10%2C%0D%0A++++12%2C%0D%0A++++14%2C%0D%0A++++100%0D%0A++%29%0D%0AGROUP+BY+rev_actor%0D%0AORDER+BY+tot+desc%0D%0ALIMIT+300
> >
> >
> > Bryan
> > --
> > Bryan Davis              Wikimedia Foundation    <[email protected]>
> > [[m:User:BDavis_(WMF)]] Manager, Technical Engagement    Boise, ID USA
> > irc: bd808                                        v:415.839.6885 x6855
> >
> > _______________________________________________
> > Wikimedia Cloud Services mailing list
> > [email protected] (formerly [email protected])
> > https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to