On Sep 19, Davide Alberani <davide.alber...@gmail.com> wrote:

> So... is anyone out there willing to help and be in charge of
> one or more parsers?

I forgot to mention how I arranged the development of the new parsers: the
old account (automatically used by IMDbPY) was changed to use the old
set of web pages (mostly: the ones about people still needs to be fixed), so
it can't be used to develop the new parsers.

I've then created a new fork of IMDbPY on bitbucket, which uses a new account
set to refer to the new web pages; this repository can be clone by here:
  http://bitbucket.org/alberanid/imdbpy_parsers2010/

Once you have cloned this repository, you can install this version on your
system (or in a virtualenv) and modify it to fix the parsers.

You can test each page as you wish; there's also a more comprehensive (well,
more or less...) set of tests: http://bitbucket.org/alberanid/imdbpy-testsuite

Specifically in the http-mobile directory.
The steps:
- download from http://erlug.linux.it/~da/erlugtmp/imdbpy_p.tar.gz a 
more-or-less
  correct set of .p files (dumps of IMDbPY objects taken when the parsers were 
in
  a good state) and untar it in the http-mobile directory.
- fetch the new .html web pages with ./test_parser.py -f
- run the tests with ./test_parser.py -t 2>&1 | less
- spot a problem (missing information or something like that), change the
  parsers and re-run the tests until the problem is not fixed. :-)

In the 'standalone/' directory there is a separate test for each file (the
ones labeled *lxml* are faster than the *bsoup* ones.

Keep in mind that it's normal to see errors about things like changes
in the number of votes, or new cast/companies informations; what really
matters is that the parser - from one run to the other - doesn't lose complete
sets of information (and that no crap ends up in the strings, movie titles and
so on).  If a key is completely missing the test_parser.py script will report
it in the lists of key that are only in the expected or in the received 
information.

If this was not clear enough, feel free to ask me anything!


-- 
Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to