Ahh yes.  Sorry for not responding sooner.  The best way to get deleted
article text is by getting the appropriate permission with a Wikimedia user
account and then using that account to hit the web API.  E.g.
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions

The best way to get this permission is to contact Community Advocacy (
[email protected] and [email protected]) to request that they
supply you with the "wmf-research" right/group.

On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <[email protected]> wrote:

> Aaron, any chance you know the answer to this question? I have a vague
> memory that we talked about deleted pages and their text some time back.
> This data should live somewhere, right? given that deleted pages can be
> restored.
>
> Thanks,
> Leila
>
> On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <[email protected]> wrote:
>
>> switching to the public list with Bob's permission.
>>
>> On Wed, Jun 24, 2015 at 1:58 PM, Robert West <[email protected]>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'd like to find all enwiki articles that were ever marked with the
>>> {{hoax}} template. Pages with that template mostly end up being deleted, so
>>> they're not available in the public revision dumps
>>> <https://dumps.wikimedia.org/enwiki/20150602/>.
>>>
>>> Hence my question:
>>> Is there a way of getting access to the full enwiki revision dump
>>> including all deleted pages?
>>> I don't know yet which deleted articles I'm interested in, but will only
>>> know that after having done a pass over the full revision history.
>>>
>>> I know that viewing deleted content is problematic
>>> <https://en.wikipedia.org/wiki/Wikipedia:Viewing_deleted_content> (hence
>>> I'm sending this request to this internal research list), but I signed an
>>> NDA and have access to data on HDFS via stat1002, so there might be a way
>>> for me to access that data?
>>>
>>> I'm also aware of a list of archived hoaxes
>>> <https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia>,
>>> but many shorter-lived hoaxes that got deleted fast are not included there.
>>>
>>> Thanks -- any pointers welcome!
>>> Bob
>>>
>>>
>>> --
>>> Up for a little language game? -- http://www.unfun.me
>>>
>>> _______________________________________________
>>> Research-Internal mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>>
>>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to