Thanks for adding that index, that's exactly what I needed. I'll update the
queries to what you suggested now. I actually looked into doing exactly
that change earlier today, but couldn't get it to hit the timestamp index.
I wasn't specifying a value for "wiki" (it's one of the queries we run, the
"global" counts), and it wasn't hitting the index, probably because of NULL
values. Now I can force it hit that new index, so everything should be back
to sane execution time for those tables.


On Mon, May 12, 2014 at 2:33 PM, Sean Pringle <[email protected]>wrote:

>
> On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <[email protected]>wrote:
>
>> Which ones are your tsvs stuff?
>>>
>>
>> The ones I'm trying to improve are querying MediaViewer_8245578 (just
>> grep for that on the page you've linked to, there will be some things I ran
>> manually today, but the big queries are the problematic ones). The existing
>> queries are full table scans and seem to take around 15 minutes. We run 27
>> of these every day one after the other. The index I want to add will
>> actually help me break those queries down into several smaller queries.
>> There's no point trying to break them down right now, because I expect that
>> without indexes the simpler queries will take just as long because they'll
>> be full table scans as well.
>>
>
> Ok, think I see the right ones.
>
> If you want to avoid breaking the queries up, push the conditions down
> onto each base table of the union:
>
> SELECT
> ...
> FROM (
>     SELECT timestamp, wiki, event_action FROM MediaViewer_7670440
>     WHERE wiki = 'cawiki'
>         AND timestamp < TIMESTAMP(CURDATE())
>         AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
>     UNION ALL
>     SELECT timestamp, wiki, event_action FROM MediaViewer_8245578
>     WHERE wiki = 'cawiki'
>         AND timestamp < TIMESTAMP(CURDATE())
>         AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
> ) AS MediaViewerUnioned
> GROUP BY datestring ORDER BY datestring ASC;
>
> That will allow a range access on the base tables' timestamp indexes which
> should reduce the row count a bit. Then if we add an index on (wiki,
> timestamp), which I'll kick off now, it drops the execution time to ~1min.
>
> UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately
> still materializes the entire set into a temporary table. Pushing the
> conditions down makes the query more verbose, but perhaps that's still
> easier than maintaining multiple queries and emulating union.
>
> Sean
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to