On Tue, May 13, 2014 at 2:16 AM, Gilles Dubuc <[email protected]> wrote:
> I noticed the IS NOT NULL version appear. Forcing the index in that case >> may not help; I suspect not-null will just cause an index scan and double >> the overhead compared to the table scan (because the secondary index isn't >> clustered). >> > > So what's the best thing to do for now? Not adding "IS NOT NULL" and > letting it do a full table scan? That global query is indeed the last one > that remains noticeably slow (it's running over 120+ million rows). > There are zero NULL values for "wiki" in those tables, so forcing the secondary index won't help and will be slightly slower than a table scan on the clustered primary key (extra level of indirection for each row touched). I suggest removing the FORCE INDEX clauses entirely from both forms of the query. Just keep the conditions pushed down to the unioned tables for now. When filtering by a wiki name, the ix_*_wiki_timestamp indexes are chosen regardless. When not filtering by wiki, the optimizer should be free to choose the is_*_timestamp index if possible (ie, if the tables ever hold a lot of data older than the 30 day range).
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
