Neta,

There are two ways to get revision text.

1. Query the API.  See
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions
 Take special note of the "content" value of the rvprop parameter.  This
strategy is good when you want to process only few revisions.

2. Process the XML dumps.  http://dumps.wikimedia.org/backup-index.html  If
you are working in python, I have some nice utilities for processing the
XML dump files.  See
http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump
This strategy is good when you want to process the entire history of a
wiki.

-Aaron

On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh <[email protected]> wrote:

> Hi,
>
> I'm trying to reach the text table (for read only purposes), but it seems
> that I it is not available to me (It is not in the table when I run SHOW
> TABLES).
>
> Does anybody know why I don't have access and if I can get one? It is
> crucial for my research as I need to analyse the text.
>
> Thanks,
> Neta
>
>
>
>
>
>
> On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh <[email protected]>
> wrote:
>
>> yeah, I do have access - Thanks!
>> I already used ssh, and also used the quarry tool for smaller quick
>> queries.
>>
>> Cheers,
>> Neta
>>
>>
>> On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu <[email protected]
>>> > wrote:
>>>
>>>> Sorry, old thread, but I wanted to point out that
>>>> http://quarry.wmflabs.org seems like a good tool for this use case.
>>>>
>>>>
>>>> On Wednesday, December 24, 2014, Leila Zia <[email protected]> wrote:
>>>>
>>>>> Hi Neta,
>>>>>
>>>>> On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Actually, this is a great opportunity to say that I would love to get
>>>>>> you guys involved or at least hear insights from the analytics team
>>>>>> regarding the project's direction.
>>>>>>
>>>>>
>>>>> Feel free to keep me in the loop for the latter.
>>>>>
>>>>> Best,
>>>>> Leila
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Here's the instructions that Christian gave with some screenshots
>>>>>>> and discussion:
>>>>>>> https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs
>>>>>>>
>>>>>>> If you're just looking to run a few queries, you might consider
>>>>>>> http://quarry.wmflabs.org which requires no shell access -- just a
>>>>>>> Wikimedia sites account.
>>>>>>>
>>>>>>> -Aaron
>>>>>>>
>>>>>>> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Neta,
>>>>>>>>
>>>>>>>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
>>>>>>>> > For my project, we will need to sql queries on current wikipedia
>>>>>>>> data
>>>>>>>> > (mostly revision history table).
>>>>>>>> >
>>>>>>>> > I already have a Gerrit account. Can I get SSH access for running
>>>>>>>> such
>>>>>>>> > queries?
>>>>>>>>
>>>>>>>> It sounds like the redacted labs databases would nicely fit your use
>>>>>>>> case. The easiest way to get access there is to apply for Tool Labs
>>>>>>>> [1].
>>>>>>>>
>>>>>>>> To get access, please file a request through
>>>>>>>>
>>>>>>>>
>>>>>>>> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
>>>>>>>>
>>>>>>>> (Many parts around the WMF are currently getting migrated to
>>>>>>>> phabricator.wikimedia.org, so if someone knows a phabricator
>>>>>>>> procedure
>>>>>>>> for that please chime in!)
>>>>>>>>
>>>>>>>>
>>>>>>>> Once you've got Tool Labs [1] access you can ssh to
>>>>>>>>
>>>>>>>>   tools-login.wmflabs.org
>>>>>>>>
>>>>>>>> and running
>>>>>>>>
>>>>>>>>   sql enwiki
>>>>>>>>
>>>>>>>> on that host connects you to labsdb's enwiki database and you can
>>>>>>>> run
>>>>>>>> your queries there (similar for other wikis).
>>>>>>>>
>>>>>>>> Have fun,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs
>>>>>>>> has more information and links about Tool Labs.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>>>>>>>>                            Companies' registry: 360296y in Linz
>>>>>>>> Christian Aistleitner
>>>>>>>> Kefermarkterstrasze 6a/3     Email:  [email protected]
>>>>>>>> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>>>>>>>>                              Fax:            +43 7946 / 20 5 81
>>>>>>>>                              Homepage: http://quelltextlich.at/
>>>>>>>> ---------------------------------------------------------------
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Analytics mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> [email protected]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to