Hi Roan,
thanks for the answers!
> > You then introduce new parameters: startid, endid, start, end (for
> start/end
> > of revid, or start/end of last touched), and amend the query:
> > if (isset ($params['start'])) {
> > $this->addWhere('page_touched>=' . $params['start']);
> > }
> >
> > Finally you need something like:
> > $this->addOption('ORDER BY', 'page_touched');
> > and
> > $this->setContinueEnumParameter('start',
> > $this->keyToTitle($row->page_latest));
> Since there's no index on page_latest, sorting and paging on it the
> way you do is inefficient. Especially the ORDER BY page_latest part
> causes a filesort of the entire page table, which has over 10 million
> entries on English Wikipedia.
I guess it would be the same if one sorted on revision id (rather than
page_latest)?
Is there a proposal one could forward to make this more efficient, by
somehow also indexing on revision id?
> that people can have a look?
> This'll probably work (albeit breaking a few things such as apfrom, as
> you mentioned), but due to the inefficient queries involved, it won't
> make it into the MediaWiki core.
>
Again, is there something that could be done to make it more efficient?
Or perhaps one could put some less efficient code in, but with a switch to
disable it on large wikis?
Just to give a little background, why I think this is important: Mediawiki
is an important platform for Open Educational Resources, and when
considering scenarios in developing countries, bandwidth is expensive, and
so mirroring is important. Of course once you mirror, you want to start up
to date, so being able to get updates since a revision or a date is
important.
Would really like to hear your ideas on what the best way forward is!
Thanks again,
Bjoern
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api