It would be awesome if you could add a similar (wiki, timestamp) index for exactly the same reasons to all the MultimediaViewerNetworkPerformance* tables on the same database. Those tables haven't been problematic yet because they're a lot smaller, but we might as well make all Multimedia EventLogging queries faster in one swoop.
On Mon, May 12, 2014 at 3:36 PM, Gilles Dubuc <[email protected]> wrote: > Thanks for adding that index, that's exactly what I needed. I'll update > the queries to what you suggested now. I actually looked into doing exactly > that change earlier today, but couldn't get it to hit the timestamp index. > I wasn't specifying a value for "wiki" (it's one of the queries we run, the > "global" counts), and it wasn't hitting the index, probably because of NULL > values. Now I can force it hit that new index, so everything should be back > to sane execution time for those tables. > > > On Mon, May 12, 2014 at 2:33 PM, Sean Pringle <[email protected]>wrote: > >> >> On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <[email protected]>wrote: >> >>> Which ones are your tsvs stuff? >>>> >>> >>> The ones I'm trying to improve are querying MediaViewer_8245578 (just >>> grep for that on the page you've linked to, there will be some things I ran >>> manually today, but the big queries are the problematic ones). The existing >>> queries are full table scans and seem to take around 15 minutes. We run 27 >>> of these every day one after the other. The index I want to add will >>> actually help me break those queries down into several smaller queries. >>> There's no point trying to break them down right now, because I expect that >>> without indexes the simpler queries will take just as long because they'll >>> be full table scans as well. >>> >> >> Ok, think I see the right ones. >> >> If you want to avoid breaking the queries up, push the conditions down >> onto each base table of the union: >> >> SELECT >> ... >> FROM ( >> SELECT timestamp, wiki, event_action FROM MediaViewer_7670440 >> WHERE wiki = 'cawiki' >> AND timestamp < TIMESTAMP(CURDATE()) >> AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY)) >> UNION ALL >> SELECT timestamp, wiki, event_action FROM MediaViewer_8245578 >> WHERE wiki = 'cawiki' >> AND timestamp < TIMESTAMP(CURDATE()) >> AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY)) >> ) AS MediaViewerUnioned >> GROUP BY datestring ORDER BY datestring ASC; >> >> That will allow a range access on the base tables' timestamp indexes >> which should reduce the row count a bit. Then if we add an index on (wiki, >> timestamp), which I'll kick off now, it drops the execution time to ~1min. >> >> UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately >> still materializes the entire set into a temporary table. Pushing the >> conditions down makes the query more verbose, but perhaps that's still >> easier than maintaining multiple queries and emulating union. >> >> Sean >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
