On Fri, Sep 17, 2021 at 3:03 PM Cristina Gava via Analytics <
[email protected]> wrote:

> Hi Jaime,
>
> Thank you so much for the thorough reply :) All the references are super
> useful and I'll go through them now. I'll start with Toolforge, since it
> seems there is consensus on it being the most appropriate tool, and leave
> the dumps for later if needed.
> I'll keep you posted.
>

It will depend a lot on the type of research needed. For example, ( to be
the devil's advocate, with a simple example) if you wanted to count the
total number of words written in Wikipedia and observe its frequency-
(meaning reading all edits in history), dumps would be a way better option
in this case, as wikireplicas only have access to medatada, not the actual
data. On top of that, reading sequentially all edits will be much faster
from a downloaded bundle, while on the live MariaDB database the access is
faster for small portions with specific conditions or small to medium
ranges.

I think starting with wikireplicas and later going for the dumps if you see
it not working for you is a totally reasonable decision, in general, as it
will require less investment on your local setup.

-- 
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Analytics mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to