On Sep 19, Davide Alberani <davide.alber...@gmail.com> wrote: > So... is anyone out there willing to help and be in charge of > one or more parsers?
I forgot to mention how I arranged the development of the new parsers: the old account (automatically used by IMDbPY) was changed to use the old set of web pages (mostly: the ones about people still needs to be fixed), so it can't be used to develop the new parsers. I've then created a new fork of IMDbPY on bitbucket, which uses a new account set to refer to the new web pages; this repository can be clone by here: http://bitbucket.org/alberanid/imdbpy_parsers2010/ Once you have cloned this repository, you can install this version on your system (or in a virtualenv) and modify it to fix the parsers. You can test each page as you wish; there's also a more comprehensive (well, more or less...) set of tests: http://bitbucket.org/alberanid/imdbpy-testsuite Specifically in the http-mobile directory. The steps: - download from http://erlug.linux.it/~da/erlugtmp/imdbpy_p.tar.gz a more-or-less correct set of .p files (dumps of IMDbPY objects taken when the parsers were in a good state) and untar it in the http-mobile directory. - fetch the new .html web pages with ./test_parser.py -f - run the tests with ./test_parser.py -t 2>&1 | less - spot a problem (missing information or something like that), change the parsers and re-run the tests until the problem is not fixed. :-) In the 'standalone/' directory there is a separate test for each file (the ones labeled *lxml* are faster than the *bsoup* ones. Keep in mind that it's normal to see errors about things like changes in the number of votes, or new cast/companies informations; what really matters is that the parser - from one run to the other - doesn't lose complete sets of information (and that no crap ends up in the strings, movie titles and so on). If a key is completely missing the test_parser.py script will report it in the lists of key that are only in the expected or in the received information. If this was not clear enough, feel free to ask me anything! -- Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel