Neta, There are two ways to get revision text.
1. Query the API. See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions Take special note of the "content" value of the rvprop parameter. This strategy is good when you want to process only few revisions. 2. Process the XML dumps. http://dumps.wikimedia.org/backup-index.html If you are working in python, I have some nice utilities for processing the XML dump files. See http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump This strategy is good when you want to process the entire history of a wiki. -Aaron On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh <[email protected]> wrote: > Hi, > > I'm trying to reach the text table (for read only purposes), but it seems > that I it is not available to me (It is not in the table when I run SHOW > TABLES). > > Does anybody know why I don't have access and if I can get one? It is > crucial for my research as I need to analyse the text. > > Thanks, > Neta > > > > > > > On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh <[email protected]> > wrote: > >> yeah, I do have access - Thanks! >> I already used ssh, and also used the quarry tool for smaller quick >> queries. >> >> Cheers, >> Neta >> >> >> On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh <[email protected]> >> wrote: >> >>> >>> >>> On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu <[email protected] >>> > wrote: >>> >>>> Sorry, old thread, but I wanted to point out that >>>> http://quarry.wmflabs.org seems like a good tool for this use case. >>>> >>>> >>>> On Wednesday, December 24, 2014, Leila Zia <[email protected]> wrote: >>>> >>>>> Hi Neta, >>>>> >>>>> On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> Actually, this is a great opportunity to say that I would love to get >>>>>> you guys involved or at least hear insights from the analytics team >>>>>> regarding the project's direction. >>>>>> >>>>> >>>>> Feel free to keep me in the loop for the latter. >>>>> >>>>> Best, >>>>> Leila >>>>> >>>>> >>>>>> >>>>>> >>>>>> On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Here's the instructions that Christian gave with some screenshots >>>>>>> and discussion: >>>>>>> https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs >>>>>>> >>>>>>> If you're just looking to run a few queries, you might consider >>>>>>> http://quarry.wmflabs.org which requires no shell access -- just a >>>>>>> Wikimedia sites account. >>>>>>> >>>>>>> -Aaron >>>>>>> >>>>>>> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Neta, >>>>>>>> >>>>>>>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >>>>>>>> > For my project, we will need to sql queries on current wikipedia >>>>>>>> data >>>>>>>> > (mostly revision history table). >>>>>>>> > >>>>>>>> > I already have a Gerrit account. Can I get SSH access for running >>>>>>>> such >>>>>>>> > queries? >>>>>>>> >>>>>>>> It sounds like the redacted labs databases would nicely fit your use >>>>>>>> case. The easiest way to get access there is to apply for Tool Labs >>>>>>>> [1]. >>>>>>>> >>>>>>>> To get access, please file a request through >>>>>>>> >>>>>>>> >>>>>>>> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request >>>>>>>> >>>>>>>> (Many parts around the WMF are currently getting migrated to >>>>>>>> phabricator.wikimedia.org, so if someone knows a phabricator >>>>>>>> procedure >>>>>>>> for that please chime in!) >>>>>>>> >>>>>>>> >>>>>>>> Once you've got Tool Labs [1] access you can ssh to >>>>>>>> >>>>>>>> tools-login.wmflabs.org >>>>>>>> >>>>>>>> and running >>>>>>>> >>>>>>>> sql enwiki >>>>>>>> >>>>>>>> on that host connects you to labsdb's enwiki database and you can >>>>>>>> run >>>>>>>> your queries there (similar for other wikis). >>>>>>>> >>>>>>>> Have fun, >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >>>>>>>> has more information and links about Tool Labs. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- >>>>>>>> Companies' registry: 360296y in Linz >>>>>>>> Christian Aistleitner >>>>>>>> Kefermarkterstrasze 6a/3 Email: [email protected] >>>>>>>> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >>>>>>>> Fax: +43 7946 / 20 5 81 >>>>>>>> Homepage: http://quelltextlich.at/ >>>>>>>> --------------------------------------------------------------- >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Analytics mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> [email protected] >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
