Thanks Aaron and Oliver!

Strategy 2 sounds like the right way to go.

By the way, I wrote a document [1] that describes the features to search
for when trying to estimate whether a page was translated or is originally
written.
Your comments are highly appreciated.

[1] How_to_detect_translated_articles
<https://www.mediawiki.org/w/index.php?title=Wikipedia_article_translation_metrics/How_to_detect_translated_articles&redirect=no>

Cheers,
Neta

On Mon, Jan 26, 2015 at 5:23 AM, Oliver Keyes <[email protected]> wrote:

> Yup. For context; because of the scale of Wikimedia's MediaWiki
> instances, we actually store revision contents in their own cluster,
> not in the pertinent field within the MediaWiki database schema - that
> field instead acts as a pointer to where the content really lives. One
> of the consequences of this is that even the R&D analysts don't have
> direct access :/. If you're operating on python, I'd thoroughly
> recommend Aaron's proposed utility; it's probably my favourite way to
> process the dumps.
>
> On 25 January 2015 at 19:18, Aaron Halfaker <[email protected]>
> wrote:
> > Neta,
> >
> > There are two ways to get revision text.
> >
> > 1. Query the API.  See
> > https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions
> > Take special note of the "content" value of the rvprop parameter.  This
> > strategy is good when you want to process only few revisions.
> >
> > 2. Process the XML dumps.  http://dumps.wikimedia.org/backup-index.html
> If
> > you are working in python, I have some nice utilities for processing the
> XML
> > dump files.  See
> >
> http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump
> > This strategy is good when you want to process the entire history of a
> wiki.
> >
> > -Aaron
> >
> > On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh <[email protected]>
> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to reach the text table (for read only purposes), but it
> seems
> >> that I it is not available to me (It is not in the table when I run SHOW
> >> TABLES).
> >>
> >> Does anybody know why I don't have access and if I can get one? It is
> >> crucial for my research as I need to analyse the text.
> >>
> >> Thanks,
> >> Neta
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh <[email protected]>
> >> wrote:
> >>>
> >>> yeah, I do have access - Thanks!
> >>> I already used ssh, and also used the quarry tool for smaller quick
> >>> queries.
> >>>
> >>> Cheers,
> >>> Neta
> >>>
> >>>
> >>> On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh <[email protected]>
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Sorry, old thread, but I wanted to point out that
> >>>>> http://quarry.wmflabs.org seems like a good tool for this use case.
> >>>>>
> >>>>>
> >>>>> On Wednesday, December 24, 2014, Leila Zia <[email protected]>
> wrote:
> >>>>>>
> >>>>>> Hi Neta,
> >>>>>>
> >>>>>> On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <[email protected]
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Actually, this is a great opportunity to say that I would love to
> get
> >>>>>>> you guys involved or at least hear insights from the analytics team
> >>>>>>> regarding the project's direction.
> >>>>>>
> >>>>>>
> >>>>>> Feel free to keep me in the loop for the latter.
> >>>>>>
> >>>>>> Best,
> >>>>>> Leila
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker
> >>>>>>> <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Here's the instructions that Christian gave with some screenshots
> >>>>>>>> and discussion:
> >>>>>>>>
> https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs
> >>>>>>>>
> >>>>>>>> If you're just looking to run a few queries, you might consider
> >>>>>>>> http://quarry.wmflabs.org which requires no shell access -- just
> a Wikimedia
> >>>>>>>> sites account.
> >>>>>>>>
> >>>>>>>> -Aaron
> >>>>>>>>
> >>>>>>>> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner
> >>>>>>>> <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Neta,
> >>>>>>>>>
> >>>>>>>>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
> >>>>>>>>> > For my project, we will need to sql queries on current
> wikipedia
> >>>>>>>>> > data
> >>>>>>>>> > (mostly revision history table).
> >>>>>>>>> >
> >>>>>>>>> > I already have a Gerrit account. Can I get SSH access for
> running
> >>>>>>>>> > such
> >>>>>>>>> > queries?
> >>>>>>>>>
> >>>>>>>>> It sounds like the redacted labs databases would nicely fit your
> >>>>>>>>> use
> >>>>>>>>> case. The easiest way to get access there is to apply for Tool
> Labs
> >>>>>>>>> [1].
> >>>>>>>>>
> >>>>>>>>> To get access, please file a request through
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
> >>>>>>>>>
> >>>>>>>>> (Many parts around the WMF are currently getting migrated to
> >>>>>>>>> phabricator.wikimedia.org, so if someone knows a phabricator
> >>>>>>>>> procedure
> >>>>>>>>> for that please chime in!)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Once you've got Tool Labs [1] access you can ssh to
> >>>>>>>>>
> >>>>>>>>>   tools-login.wmflabs.org
> >>>>>>>>>
> >>>>>>>>> and running
> >>>>>>>>>
> >>>>>>>>>   sql enwiki
> >>>>>>>>>
> >>>>>>>>> on that host connects you to labsdb's enwiki database and you can
> >>>>>>>>> run
> >>>>>>>>> your queries there (similar for other wikis).
> >>>>>>>>>
> >>>>>>>>> Have fun,
> >>>>>>>>> Christian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs
> >>>>>>>>> has more information and links about Tool Labs.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
> >>>>>>>>>                            Companies' registry: 360296y in Linz
> >>>>>>>>> Christian Aistleitner
> >>>>>>>>> Kefermarkterstrasze 6a/3     Email:  [email protected]
> >>>>>>>>> 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
> >>>>>>>>>                              Fax:            +43 7946 / 20 5 81
> >>>>>>>>>                              Homepage: http://quelltextlich.at/
> >>>>>>>>> ---------------------------------------------------------------
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Analytics mailing list
> >>>>>>>>> [email protected]
> >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Analytics mailing list
> >>>>>>>> [email protected]
> >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Analytics mailing list
> >>>>>>> [email protected]
> >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Analytics mailing list
> >>>>> [email protected]
> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>
> >>>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to