It would be awesome if you could add a similar (wiki, timestamp) index for
exactly the same reasons to all the MultimediaViewerNetworkPerformance*
tables on the same database. Those tables haven't been problematic yet
because they're a lot smaller, but we might as well make all Multimedia
EventLogging queries faster in one swoop.


On Mon, May 12, 2014 at 3:36 PM, Gilles Dubuc <[email protected]> wrote:

> Thanks for adding that index, that's exactly what I needed. I'll update
> the queries to what you suggested now. I actually looked into doing exactly
> that change earlier today, but couldn't get it to hit the timestamp index.
> I wasn't specifying a value for "wiki" (it's one of the queries we run, the
> "global" counts), and it wasn't hitting the index, probably because of NULL
> values. Now I can force it hit that new index, so everything should be back
> to sane execution time for those tables.
>
>
> On Mon, May 12, 2014 at 2:33 PM, Sean Pringle <[email protected]>wrote:
>
>>
>> On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <[email protected]>wrote:
>>
>>> Which ones are your tsvs stuff?
>>>>
>>>
>>> The ones I'm trying to improve are querying MediaViewer_8245578 (just
>>> grep for that on the page you've linked to, there will be some things I ran
>>> manually today, but the big queries are the problematic ones). The existing
>>> queries are full table scans and seem to take around 15 minutes. We run 27
>>> of these every day one after the other. The index I want to add will
>>> actually help me break those queries down into several smaller queries.
>>> There's no point trying to break them down right now, because I expect that
>>> without indexes the simpler queries will take just as long because they'll
>>> be full table scans as well.
>>>
>>
>> Ok, think I see the right ones.
>>
>> If you want to avoid breaking the queries up, push the conditions down
>> onto each base table of the union:
>>
>> SELECT
>> ...
>> FROM (
>>     SELECT timestamp, wiki, event_action FROM MediaViewer_7670440
>>     WHERE wiki = 'cawiki'
>>         AND timestamp < TIMESTAMP(CURDATE())
>>         AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
>>     UNION ALL
>>     SELECT timestamp, wiki, event_action FROM MediaViewer_8245578
>>     WHERE wiki = 'cawiki'
>>         AND timestamp < TIMESTAMP(CURDATE())
>>         AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY))
>> ) AS MediaViewerUnioned
>> GROUP BY datestring ORDER BY datestring ASC;
>>
>> That will allow a range access on the base tables' timestamp indexes
>> which should reduce the row count a bit. Then if we add an index on (wiki,
>> timestamp), which I'll kick off now, it drops the execution time to ~1min.
>>
>> UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately
>> still materializes the entire set into a temporary table. Pushing the
>> conditions down makes the query more verbose, but perhaps that's still
>> easier than maintaining multiple queries and emulating union.
>>
>> Sean
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to