Thanks for adding that index, that's exactly what I needed. I'll update the queries to what you suggested now. I actually looked into doing exactly that change earlier today, but couldn't get it to hit the timestamp index. I wasn't specifying a value for "wiki" (it's one of the queries we run, the "global" counts), and it wasn't hitting the index, probably because of NULL values. Now I can force it hit that new index, so everything should be back to sane execution time for those tables.
On Mon, May 12, 2014 at 2:33 PM, Sean Pringle <[email protected]>wrote: > > On Mon, May 12, 2014 at 8:48 PM, Gilles Dubuc <[email protected]>wrote: > >> Which ones are your tsvs stuff? >>> >> >> The ones I'm trying to improve are querying MediaViewer_8245578 (just >> grep for that on the page you've linked to, there will be some things I ran >> manually today, but the big queries are the problematic ones). The existing >> queries are full table scans and seem to take around 15 minutes. We run 27 >> of these every day one after the other. The index I want to add will >> actually help me break those queries down into several smaller queries. >> There's no point trying to break them down right now, because I expect that >> without indexes the simpler queries will take just as long because they'll >> be full table scans as well. >> > > Ok, think I see the right ones. > > If you want to avoid breaking the queries up, push the conditions down > onto each base table of the union: > > SELECT > ... > FROM ( > SELECT timestamp, wiki, event_action FROM MediaViewer_7670440 > WHERE wiki = 'cawiki' > AND timestamp < TIMESTAMP(CURDATE()) > AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY)) > UNION ALL > SELECT timestamp, wiki, event_action FROM MediaViewer_8245578 > WHERE wiki = 'cawiki' > AND timestamp < TIMESTAMP(CURDATE()) > AND timestamp >= TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 30 DAY)) > ) AS MediaViewerUnioned > GROUP BY datestring ORDER BY datestring ASC; > > That will allow a range access on the base tables' timestamp indexes which > should reduce the row count a bit. Then if we add an index on (wiki, > timestamp), which I'll kick off now, it drops the execution time to ~1min. > > UNION ALL avoids the sorting overhead of UNION DISTINCT but unfortunately > still materializes the entire set into a temporary table. Pushing the > conditions down makes the query more verbose, but perhaps that's still > easier than maintaining multiple queries and emulating union. > > Sean > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
