2009/6/3 sl contrib <[email protected]>:
> Hi Roan,
>
>>
>> > Would it somehow be possible to build an intermediate solution? E.g.
>> > would
>> > it be feasible to build a dedicated
>> > action=query&prop=allchanges&start=...&end=...
>> > that just solved that problem?
>> For revisions, possibly. It wouldn't include log events, though.
>
> I've had a go a modifying the code for allpages.
> Basically if this is made conditional:
> $this->addWhereFld('page_namespace', $params['namespace']);
>
> then all pages can be searched (irrespective of namespace). Has this got a
> massive impact on efficiency?
Yes, for queries with certain oft-used parameters, this'll harm
efficiency a lot.
> The maximum number of entries returned is
> limited anyway, and it shouldn't really matter which namespace they come
> from. (Of course some things like apfrom no longer work as expected, but for
> my usecase, it would be ok to be disabled.)
Not only do they no longer work as expected, they also cause inefficiency.
> You then introduce new parameters: startid, endid, start, end (for start/end
> of revid, or start/end of last touched), and amend the query:
> if (isset ($params['start'])) {
> $this->addWhere('page_touched>=' . $params['start']);
> }
>
> Finally you need something like:
> $this->addOption('ORDER BY', 'page_touched');
> and
> $this->setContinueEnumParameter('start',
> $this->keyToTitle($row->page_latest));
Since there's no index on page_latest, sorting and paging on it the
way you do is inefficient. Especially the ORDER BY page_latest part
causes a filesort of the entire page table, which has over 10 million
entries on English Wikipedia.
> With those changes (and a few conditionals) 'allpages' can produce a list of
> pages that were touched between two dates, or a set of pages that have new
> revisions between two revision numbers. Not sure yet whether last touched
> will work as well as the revision timestamp, but at least from the revision
> number you could easily update an offline set of wiki pages.
> Do you think this looks good so far? Should I post the code somewhere so
> that people can have a look?
This'll probably work (albeit breaking a few things such as apfrom, as
you mentioned), but due to the inefficient queries involved, it won't
make it into the MediaWiki core.
Roan Kattouw (Catrope)
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api