No way of searching the content of deleted pages. You can start with the `archive` table. You might find that you can identify edits that add 'hoax' templates by performing a regex match on `archive.ar_comment`.
-Aaron On Thu, Jun 25, 2015 at 5:16 PM, Robert West <[email protected]> wrote: > Thanks, Aaron! > > On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker <[email protected]> > wrote: > > Ahh yes. Sorry for not responding sooner. The best way to get deleted > > article text is by getting the appropriate permission with a Wikimedia > user > > account and then using that account to hit the web API. E.g. > > > https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions > > Looking at this page, it seems I need to supply the title, pageid, or > revid of the deleted page (or page with deleted revisions) I'm > interested in. > However, I don't know yet what pages are relevant to me -- I only know > this after having done a pass over the text of *all* deleted > revisions. > More concretely, my query is basically "all deleted revisions that > contain the {{hoax}} template", but I don't know yet which deleted > pages have such revisions. > > Is there any way of doing this? > > Thanks! > Bob > > > The best way to get this permission is to contact Community Advocacy > > ([email protected] and [email protected]) to request that > they > > supply you with the "wmf-research" right/group. > > > > On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <[email protected]> wrote: > >> > >> Aaron, any chance you know the answer to this question? I have a vague > >> memory that we talked about deleted pages and their text some time back. > >> This data should live somewhere, right? given that deleted pages can be > >> restored. > >> > >> Thanks, > >> Leila > >> > >> On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <[email protected]> wrote: > >>> > >>> switching to the public list with Bob's permission. > >>> > >>> On Wed, Jun 24, 2015 at 1:58 PM, Robert West < > [email protected]> > >>> wrote: > >>>> > >>>> Hi everyone, > >>>> > >>>> I'd like to find all enwiki articles that were ever marked with the > >>>> {{hoax}} template. Pages with that template mostly end up being > deleted, so > >>>> they're not available in the public revision dumps. > >>>> > >>>> Hence my question: > >>>> Is there a way of getting access to the full enwiki revision dump > >>>> including all deleted pages? > >>>> I don't know yet which deleted articles I'm interested in, but will > only > >>>> know that after having done a pass over the full revision history. > >>>> > >>>> I know that viewing deleted content is problematic (hence I'm sending > >>>> this request to this internal research list), but I signed an NDA and > have > >>>> access to data on HDFS via stat1002, so there might be a way for me to > >>>> access that data? > >>>> > >>>> I'm also aware of a list of archived hoaxes, but many shorter-lived > >>>> hoaxes that got deleted fast are not included there. > >>>> > >>>> Thanks -- any pointers welcome! > >>>> Bob > >>>> > >>>> > >>>> -- > >>>> Up for a little language game? -- http://www.unfun.me > >>>> > >>>> _______________________________________________ > >>>> Research-Internal mailing list > >>>> [email protected] > >>>> https://lists.wikimedia.org/mailman/listinfo/research-internal > >>>> > >>> > >> > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Up for a little language game? -- http://www.unfun.me > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
