Ahh yes. Sorry for not responding sooner. The best way to get deleted article text is by getting the appropriate permission with a Wikimedia user account and then using that account to hit the web API. E.g. https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions
The best way to get this permission is to contact Community Advocacy ( [email protected] and [email protected]) to request that they supply you with the "wmf-research" right/group. On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <[email protected]> wrote: > Aaron, any chance you know the answer to this question? I have a vague > memory that we talked about deleted pages and their text some time back. > This data should live somewhere, right? given that deleted pages can be > restored. > > Thanks, > Leila > > On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <[email protected]> wrote: > >> switching to the public list with Bob's permission. >> >> On Wed, Jun 24, 2015 at 1:58 PM, Robert West <[email protected]> >> wrote: >> >>> Hi everyone, >>> >>> I'd like to find all enwiki articles that were ever marked with the >>> {{hoax}} template. Pages with that template mostly end up being deleted, so >>> they're not available in the public revision dumps >>> <https://dumps.wikimedia.org/enwiki/20150602/>. >>> >>> Hence my question: >>> Is there a way of getting access to the full enwiki revision dump >>> including all deleted pages? >>> I don't know yet which deleted articles I'm interested in, but will only >>> know that after having done a pass over the full revision history. >>> >>> I know that viewing deleted content is problematic >>> <https://en.wikipedia.org/wiki/Wikipedia:Viewing_deleted_content> (hence >>> I'm sending this request to this internal research list), but I signed an >>> NDA and have access to data on HDFS via stat1002, so there might be a way >>> for me to access that data? >>> >>> I'm also aware of a list of archived hoaxes >>> <https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia>, >>> but many shorter-lived hoaxes that got deleted fast are not included there. >>> >>> Thanks -- any pointers welcome! >>> Bob >>> >>> >>> -- >>> Up for a little language game? -- http://www.unfun.me >>> >>> _______________________________________________ >>> Research-Internal mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/research-internal >>> >>> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
