On Sun, Sep 7, 2014 at 8:11 AM, Sean Pringle <[email protected]> wrote:
> > On Sun, Sep 7, 2014 at 5:58 AM, Brad Jorsch (Anomie) < > [email protected]> wrote: > >> The database query for that is simple enough: >> >> SELECT /* ApiQueryCategoryMembers::run Anomie */ >> cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp FROM >> `page`,`categorylinks` FORCE INDEX (cl_timestamp) WHERE cl_to = >> 'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id) ORDER BY >> cl_timestamp,cl_from LIMIT 501; >> >> And the PHP code doesn't do anything complicated either. Maybe Sean can >> give us more insight if there's some subtle database thing going on here. >> > > As Nik noted, the query plan walking cl_timestamp is not ideal. Plus, even > with the forced index the query requires a filesort since cl_timestamp > index is on (cl_to,cl_timestamp) and not (cl_timestamp,cl_from). > We're including a constant cl_to in the query here, so the index on (cl_to,cl_timestamp) is exactly what we want. As for ORDER BY cl_timestamp, cl_from, that's https://gerrit.wikimedia.org/r/#/c/103589/ taking advantage of InnoDB's clustered indexes where it silently appends the primary key (or in this case the first/only UNIQUE key, cl_from) to all other indexes. When I EXPLAIN this query against enwiki, there's no filesort on master, db1055, db1051, and db1066. stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => categorylinks [type] => ref [possible_keys] => cl_timestamp [key] => cl_timestamp [key_len] => 257 [ref] => const [rows] => 635858 [Extra] => Using index condition; Using where ) stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => page [type] => eq_ref [possible_keys] => PRIMARY [key] => PRIMARY [key_len] => 4 [ref] => enwiki.categorylinks.cl_from [rows] => 1 [Extra] => ) There is on db1061, db1062, db1065, db1072, and db1073. stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => categorylinks [type] => ref [possible_keys] => cl_timestamp [key] => cl_timestamp [key_len] => 257 [ref] => const [rows] => 706656 [Extra] => Using index condition; Using where; Using filesort ) stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => page [type] => eq_ref [possible_keys] => PRIMARY [key] => PRIMARY [key_len] => 4 [ref] => enwiki.categorylinks.cl_from [rows] => 1 [Extra] => ) My wild guess would be that the latter set of databases were somehow created differently so that InnoDB is clustering using some index other than cl_from. > > Removing the FORCE INDEX would allow cl_sortkey index to be used, with > better selectivity. > On master, db1055, db1051, and db1066, removing the FORCE INDEX still reports from EXPLAIN that it chose the cl_timestamp index. It does cause EXPLAIN to stop saying "Using index condition" though. stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => categorylinks [type] => ref [possible_keys] => cl_from,cl_timestamp,cl_sortkey [key] => cl_timestamp [key_len] => 257 [ref] => const [rows] => 525652 [Extra] => Using where ) stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => page [type] => eq_ref [possible_keys] => PRIMARY [key] => PRIMARY [key_len] => 4 [ref] => enwiki.categorylinks.cl_from [rows] => 1 [Extra] => ) On db1061, db1062, db1065, db1072, and db1073, it does choose cl_sortkey but it still filesorts. stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => categorylinks [type] => ref [possible_keys] => cl_from,cl_timestamp,cl_sortkey [key] => cl_sortkey [key_len] => 257 [ref] => const [rows] => 525652 [Extra] => Using index condition; Using where; Using filesort ) stdClass Object ( [id] => 1 [select_type] => SIMPLE [table] => page [type] => eq_ref [possible_keys] => PRIMARY [key] => PRIMARY [key_len] => 4 [ref] => enwiki.categorylinks.cl_from [rows] => 1 [Extra] => ) If I also remove the cl_from from the ORDER BY, then all databases return the same query using the cl_timestamp index. But the API can't do that without reopening https://bugzilla.wikimedia.org/show_bug.cgi?id=24782. -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
_______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
