Yup. For context; because of the scale of Wikimedia's MediaWiki instances, we actually store revision contents in their own cluster, not in the pertinent field within the MediaWiki database schema - that field instead acts as a pointer to where the content really lives. One of the consequences of this is that even the R&D analysts don't have direct access :/. If you're operating on python, I'd thoroughly recommend Aaron's proposed utility; it's probably my favourite way to process the dumps.
On 25 January 2015 at 19:18, Aaron Halfaker <[email protected]> wrote: > Neta, > > There are two ways to get revision text. > > 1. Query the API. See > https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions > Take special note of the "content" value of the rvprop parameter. This > strategy is good when you want to process only few revisions. > > 2. Process the XML dumps. http://dumps.wikimedia.org/backup-index.html If > you are working in python, I have some nice utilities for processing the XML > dump files. See > http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump > This strategy is good when you want to process the entire history of a wiki. > > -Aaron > > On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh <[email protected]> wrote: >> >> Hi, >> >> I'm trying to reach the text table (for read only purposes), but it seems >> that I it is not available to me (It is not in the table when I run SHOW >> TABLES). >> >> Does anybody know why I don't have access and if I can get one? It is >> crucial for my research as I need to analyse the text. >> >> Thanks, >> Neta >> >> >> >> >> >> >> On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh <[email protected]> >> wrote: >>> >>> yeah, I do have access - Thanks! >>> I already used ssh, and also used the quarry tool for smaller quick >>> queries. >>> >>> Cheers, >>> Neta >>> >>> >>> On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh <[email protected]> >>> wrote: >>>> >>>> >>>> >>>> On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu >>>> <[email protected]> wrote: >>>>> >>>>> Sorry, old thread, but I wanted to point out that >>>>> http://quarry.wmflabs.org seems like a good tool for this use case. >>>>> >>>>> >>>>> On Wednesday, December 24, 2014, Leila Zia <[email protected]> wrote: >>>>>> >>>>>> Hi Neta, >>>>>> >>>>>> On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Actually, this is a great opportunity to say that I would love to get >>>>>>> you guys involved or at least hear insights from the analytics team >>>>>>> regarding the project's direction. >>>>>> >>>>>> >>>>>> Feel free to keep me in the loop for the latter. >>>>>> >>>>>> Best, >>>>>> Leila >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker >>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> Here's the instructions that Christian gave with some screenshots >>>>>>>> and discussion: >>>>>>>> https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs >>>>>>>> >>>>>>>> If you're just looking to run a few queries, you might consider >>>>>>>> http://quarry.wmflabs.org which requires no shell access -- just a >>>>>>>> Wikimedia >>>>>>>> sites account. >>>>>>>> >>>>>>>> -Aaron >>>>>>>> >>>>>>>> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner >>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi Neta, >>>>>>>>> >>>>>>>>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >>>>>>>>> > For my project, we will need to sql queries on current wikipedia >>>>>>>>> > data >>>>>>>>> > (mostly revision history table). >>>>>>>>> > >>>>>>>>> > I already have a Gerrit account. Can I get SSH access for running >>>>>>>>> > such >>>>>>>>> > queries? >>>>>>>>> >>>>>>>>> It sounds like the redacted labs databases would nicely fit your >>>>>>>>> use >>>>>>>>> case. The easiest way to get access there is to apply for Tool Labs >>>>>>>>> [1]. >>>>>>>>> >>>>>>>>> To get access, please file a request through >>>>>>>>> >>>>>>>>> >>>>>>>>> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request >>>>>>>>> >>>>>>>>> (Many parts around the WMF are currently getting migrated to >>>>>>>>> phabricator.wikimedia.org, so if someone knows a phabricator >>>>>>>>> procedure >>>>>>>>> for that please chime in!) >>>>>>>>> >>>>>>>>> >>>>>>>>> Once you've got Tool Labs [1] access you can ssh to >>>>>>>>> >>>>>>>>> tools-login.wmflabs.org >>>>>>>>> >>>>>>>>> and running >>>>>>>>> >>>>>>>>> sql enwiki >>>>>>>>> >>>>>>>>> on that host connects you to labsdb's enwiki database and you can >>>>>>>>> run >>>>>>>>> your queries there (similar for other wikis). >>>>>>>>> >>>>>>>>> Have fun, >>>>>>>>> Christian >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >>>>>>>>> has more information and links about Tool Labs. >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- >>>>>>>>> Companies' registry: 360296y in Linz >>>>>>>>> Christian Aistleitner >>>>>>>>> Kefermarkterstrasze 6a/3 Email: [email protected] >>>>>>>>> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >>>>>>>>> Fax: +43 7946 / 20 5 81 >>>>>>>>> Homepage: http://quelltextlich.at/ >>>>>>>>> --------------------------------------------------------------- >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Analytics mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Analytics mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
