On 10/02/2017 12:30 PM, Roy Smith wrote:
I’m not seeing how to access the wikitext for a specific revision via
the API.  I can get the HTML with /page/html/{title}/{revision}, but I
don’t see how to get the wikitext.  Do I really need to get the HTML and
then feed that through /transform/html/to/wikitext?  That seems
suboptimal.  Not to mention rate limited :-(

What I want to do is get the wikitext for every revision of a page.

If you just want to download some revisions of a single page (for development purposes), https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Iron&rvprop=timestamp|user|comment|content&rvlimit=max&formatversion=2 should be enough.

You'll have to use rvcontinue to get more than 50, and you should probably use a library like pywikibot.

Later, if you want to do it for more articles, go to https://dumps.wikimedia.org/backup-index-bydb.html and choose a wiki (e.g. enwiki).

You may need to click "Last dumped on" a couple times until you find a "All pages with complete edit history" with the links.

You can then download either a single archive (with all revisions of a subset of pages), or all of them.

Matt Flaschen

_______________________________________________
Cloud mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to