On 10/02/2017 12:30 PM, Roy Smith wrote:
I’m not seeing how to access the wikitext for a specific revision via
the API. I can get the HTML with /page/html/{title}/{revision}, but I
don’t see how to get the wikitext. Do I really need to get the HTML and
then feed that through /transform/html/to/wikitext? That seems
suboptimal. Not to mention rate limited :-(
What I want to do is get the wikitext for every revision of a page.
If you just want to download some revisions of a single page (for
development purposes),
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Iron&rvprop=timestamp|user|comment|content&rvlimit=max&formatversion=2
should be enough.
You'll have to use rvcontinue to get more than 50, and you should
probably use a library like pywikibot.
Later, if you want to do it for more articles, go to
https://dumps.wikimedia.org/backup-index-bydb.html and choose a wiki
(e.g. enwiki).
You may need to click "Last dumped on" a couple times until you find a
"All pages with complete edit history" with the links.
You can then download either a single archive (with all revisions of a
subset of pages), or all of them.
Matt Flaschen
_______________________________________________
Cloud mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/cloud