On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <[email protected]> wrote:

> Which ones are your tsvs stuff?
>>
>
> The ones I'm trying to improve are querying MediaViewer_8245578 (just grep
> for that on the page you've linked to, there will be some things I ran
> manually today, but the big queries are the problematic ones). The existing
> queries are full table scans and seem to take around 15 minutes. We run 27
> of these every day one after the other. The index I want to add will
> actually help me break those queries down into several smaller queries.
> There's no point trying to break them down right now, because I expect that
> without indexes the simpler queries will take just as long because they'll
> be full table scans as well.
>

Ok, think I see the right ones.

If you want to avoid breaking the queries up, push the conditions down onto
each base table of the union:

SELECT
...
FROM (
    SELECT timestamp, wiki, event_action FROM MediaViewer_7670440
    WHERE wiki = 'cawiki'
        AND timestamp < TIMESTAMP(CURDATE())
        AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
    UNION ALL
    SELECT timestamp, wiki, event_action FROM MediaViewer_8245578
    WHERE wiki = 'cawiki'
        AND timestamp < TIMESTAMP(CURDATE())
        AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
) AS MediaViewerUnioned
GROUP BY datestring ORDER BY datestring ASC;

That will allow a range access on the base tables' timestamp indexes which
should reduce the row count a bit. Then if we add an index on (wiki,
timestamp), which I'll kick off now, it drops the execution time to ~1min.

UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately
still materializes the entire set into a temporary table. Pushing the
conditions down makes the query more verbose, but perhaps that's still
easier than maintaining multiple queries and emulating union.

Sean
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to