Emilio, I'm very interested in making your XML dump processing work easier. If you file any bugs against the old[1] or new[2] libraries, I'll be quick to turn around on them.
1. https://bitbucket.org/halfak/wikimedia-utilities 2. https://github.com/halfak/mediawiki-utilities -Aaron On Mon, May 12, 2014 at 10:30 AM, Morten Wang <[email protected]> wrote: > Hi Emilio, > > You're probably aware of it, but one way to handle your own installs is to > use virtual environments: https://virtualenv.pypa.io/en/latest/ > > BTW, the Python utilities you pointed to is now deprecated in favour of a > newer version, but the newer version is Python 3.x only: > http://pythonhosted.org/mediawiki-utilities/ > > I have the older version of his utilities installed in my virtual > environment. When I processed the English dump about a month ago I used > tools-dev for testing and then submitted jobs to the job servers when it > was ready, running over the smaller split files of the dump for > parallelisation and less memory usage. > > From what I've heard the newer library is considerably faster than the 2.x > version, but I haven't yet had a project where I could test that. > > > Regards, > Morten > > > > On 11 May 2014 13:10, Emilio J. Rodríguez-Posada <[email protected]> wrote: > >> Hi; >> >> I would like to process some Wikipedia dumps. The right place for this is >> tools-dev? I don't see Wikimedia Utilities[1] available there. >> >> Do I have to install it or this is a task for an admin? >> >> Regards >> >> [1] https://bitbucket.org/halfak/wikimedia-utilities/wiki/Home >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > >
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
