No way of searching the content of deleted pages.  You can start with the
`archive` table.  You might find that you can identify edits that add
'hoax' templates by performing a regex match on `archive.ar_comment`.

-Aaron

On Thu, Jun 25, 2015 at 5:16 PM, Robert West <[email protected]>
wrote:

> Thanks, Aaron!
>
> On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker <[email protected]>
> wrote:
> > Ahh yes.  Sorry for not responding sooner.  The best way to get deleted
> > article text is by getting the appropriate permission with a Wikimedia
> user
> > account and then using that account to hit the web API.  E.g.
> >
> https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions
>
> Looking at this page, it seems I need to supply the title, pageid, or
> revid of the deleted page (or page with deleted revisions) I'm
> interested in.
> However, I don't know yet what pages are relevant to me -- I only know
> this after having done a pass over the text of *all* deleted
> revisions.
> More concretely, my query is basically "all deleted revisions that
> contain the {{hoax}} template", but I don't know yet which deleted
> pages have such revisions.
>
> Is there any way of doing this?
>
> Thanks!
> Bob
>
> > The best way to get this permission is to contact Community Advocacy
> > ([email protected] and [email protected]) to request that
> they
> > supply you with the "wmf-research" right/group.
> >
> > On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <[email protected]> wrote:
> >>
> >> Aaron, any chance you know the answer to this question? I have a vague
> >> memory that we talked about deleted pages and their text some time back.
> >> This data should live somewhere, right? given that deleted pages can be
> >> restored.
> >>
> >> Thanks,
> >> Leila
> >>
> >> On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <[email protected]> wrote:
> >>>
> >>> switching to the public list with Bob's permission.
> >>>
> >>> On Wed, Jun 24, 2015 at 1:58 PM, Robert West <
> [email protected]>
> >>> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I'd like to find all enwiki articles that were ever marked with the
> >>>> {{hoax}} template. Pages with that template mostly end up being
> deleted, so
> >>>> they're not available in the public revision dumps.
> >>>>
> >>>> Hence my question:
> >>>> Is there a way of getting access to the full enwiki revision dump
> >>>> including all deleted pages?
> >>>> I don't know yet which deleted articles I'm interested in, but will
> only
> >>>> know that after having done a pass over the full revision history.
> >>>>
> >>>> I know that viewing deleted content is problematic (hence I'm sending
> >>>> this request to this internal research list), but I signed an NDA and
> have
> >>>> access to data on HDFS via stat1002, so there might be a way for me to
> >>>> access that data?
> >>>>
> >>>> I'm also aware of a list of archived hoaxes, but many shorter-lived
> >>>> hoaxes that got deleted fast are not included there.
> >>>>
> >>>> Thanks -- any pointers welcome!
> >>>> Bob
> >>>>
> >>>>
> >>>> --
> >>>> Up for a little language game? -- http://www.unfun.me
> >>>>
> >>>> _______________________________________________
> >>>> Research-Internal mailing list
> >>>> [email protected]
> >>>> https://lists.wikimedia.org/mailman/listinfo/research-internal
> >>>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Up for a little language game? -- http://www.unfun.me
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to