On Fri, Sep 17, 2021 at 3:03 PM Cristina Gava via Analytics < [email protected]> wrote:
> Hi Jaime, > > Thank you so much for the thorough reply :) All the references are super > useful and I'll go through them now. I'll start with Toolforge, since it > seems there is consensus on it being the most appropriate tool, and leave > the dumps for later if needed. > I'll keep you posted. > It will depend a lot on the type of research needed. For example, ( to be the devil's advocate, with a simple example) if you wanted to count the total number of words written in Wikipedia and observe its frequency- (meaning reading all edits in history), dumps would be a way better option in this case, as wikireplicas only have access to medatada, not the actual data. On top of that, reading sequentially all edits will be much faster from a downloaded bundle, while on the live MariaDB database the access is faster for small portions with specific conditions or small to medium ranges. I think starting with wikireplicas and later going for the dumps if you see it not working for you is a totally reasonable decision, in general, as it will require less investment on your local setup. -- Jaime Crespo <http://wikimedia.org>
_______________________________________________ Analytics mailing list -- [email protected] To unsubscribe send an email to [email protected]
