Hey,
One important optimization you can use and it's often missed out (and it's
going to be needed more as we normalize more tables) is join decomposition.
It basically means you don't join and query but do two (or several) queries
separately in your code. This might seem counter intuitive but it's pretty
useful for several reasons (like MySQL will cache the normalizing table and
answer faster or you yourself can cache some parts). You can read more
about those in "High Performance MySQL" book.

HTH

On Mon, Jun 3, 2019 at 3:43 AM Jesse Plamondon-Willard <
[email protected]> wrote:

> Hello,
>
> Some Toolforge SQL queries have much worse performance when updated for
> the new comment table.
>
> For example, see one previous SQL query
> <https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++++++++++++++++user_id%2C%0D%0A++++++++++++++++++++user_registration%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28user_registration%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+registration%2C%0D%0A++++++++++++++++++++user_editcount%2C%0D%0A++++++++++++++++++++GROUP_CONCAT%28ug_group+SEPARATOR+%22%2C+%22%29+AS+user_groups%2C%0D%0A++++++++++++++++++++ipb_by_text%2C%0D%0A++++++++++++++++++++%27%27+AS+ipb_reason%2C+--+column+deleted%0D%0A++++++++++++++++++++DATE_FORMAT%28ipb_timestamp%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+ipb_timestamp%2C%0D%0A++++++++++++++++++++ipb_deleted%2C%0D%0A++++++++++++++++++++COALESCE%28DATE_FORMAT%28ipb_expiry%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29%2C+ipb_expiry%29+AS+ipb_expiry%0D%0A++++++++++++++++FROM%0D%0A++++++++++++++++++++user%0D%0A++++++++++++++++++++LEFT+JOIN+user_groups+ON+user_id+%3D+ug_user%0D%0A++++++++++++++++++++LEFT+JOIN+ipblocks_ipindex+ON+user_id+%3D+ipb_user%0D%0A++++++++++++++++WHERE+user_name+%3D+%27Pathoschild%27%0D%0A++++++++++++++++LIMIT+1>
> and its updated version
> <https://tools.wmflabs.org/sql-optimizer?use=enwiki_p&sql=SELECT%0D%0A++++++++++++++++++++user_id%2C%0D%0A++++++++++++++++++++user_registration%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28user_registration%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+registration%2C%0D%0A++++++++++++++++++++user_editcount%2C%0D%0A++++++++++++++++++++GROUP_CONCAT%28ug_group+SEPARATOR+%22%2C+%22%29+AS+user_groups%2C%0D%0A++++++++++++++++++++ipb_by_text%2C%0D%0A++++++++++++++++++++comment_text+AS+ipb_reason%2C%0D%0A++++++++++++++++++++DATE_FORMAT%28ipb_timestamp%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29+AS+ipb_timestamp%2C%0D%0A++++++++++++++++++++ipb_deleted%2C%0D%0A++++++++++++++++++++COALESCE%28DATE_FORMAT%28ipb_expiry%2C+%22%25Y-%25m-%25d+%25H%3A%25i%22%29%2C+ipb_expiry%29+AS+ipb_expiry%0D%0A++++++++++++++++FROM%0D%0A++++++++++++++++++++user%0D%0A++++++++++++++++++++LEFT+JOIN+user_groups+ON+user_id+%3D+ug_user%0D%0A++++++++++++++++++++LEFT+JOIN+ipblocks_ipindex+ON+user_id+%3D+ipb_user%0D%0A++++++++++++++++++++LEFT+JOIN+comment+ON+ipb_reason_id+%3D+comment_id%0D%0A++++++++++++++++WHERE+user_name+%3D+%27Pathoschild%27%0D%0A++++++++++++++++LIMIT+1>
> which just adds a join on the comment table. The change adds ten dependent
> subqueries and increases Stalktoy load times from ≈15 seconds to ≈245
> seconds, and the same change for another query was enough to have my
> tools temporarily rate-limited <https://phabricator.wikimedia.org/T217853>
> .
>
> I haven't found a way to update those queries efficiently. Is there an
> optimisation I'm missing? Why does the comment view need subqueries on ten
> other tables, and is there an alternate version without the subqueries for
> cases where we just need to join by ID?
>
> --
> Jesse Plamondon-Willard (Pathoschild)
> _______________________________________________
> Wikimedia Cloud Services mailing list
> [email protected] (formerly [email protected])
> https://lists.wikimedia.org/mailman/listinfo/cloud



-- 
Amir (he/him)
_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to