[Imdbpy-devel] development on new design

Davide Alberani Sun, 25 Feb 2007 03:50:58 -0800

I've taken the first steps to begin to work on the new design of the
IMDb site.


[CVS]
In the CVS there's a branch called "newdesign" which includes the first
changes.
You can checkout it with:
  cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/imdbpy co -r newdesign imdbpy

(replace anonymous with your account, if you're an already authorized
developer)
This branch uses the new 'IMDbPYweb' account, set to use the new layout -
to access the IMDb web site.

I'm not an expert of CVS - I hope to have set up this thing appropriately.


[test-suite]
There is a test-suite slightly modified to help the development here:
  http://imdbpy.sourceforge.net/imdbpy-3.0-testsuite.tar.gz

in the 'http' directory there are the main useful (?) tool.
The first one is the test_parser.py testsuite.
Run it with the -f option and it will start downloading (a lot) of
pages from imdb.com.
Then run it with the -p option: it will read every downloaded
HTML page and parse it with the appropriate parser, saving the
result in a ".p" file.
Now you can test, with the -t option, that the output of a second
run of the parser on the various files (it's assumed than between
the first and the second run you've modified the parser, otherwise
there's nothing to test!); if the output is different, the test fails
and the differences are - more or less - nicely print.

Another way to use this test_parser.py is that: you've a parser that
you know is working ok, and you've downloaded the html pages and
created the ".p" files; after some time (weeks or months) you want to
check that the parser is still working, so you move the ".html" file
elsewhere and fetch again the whole data with the -f option _without_
recreating the ".p" file.  Then you can run again the -t test to see
if the output of the fresh ".html" pages is consistently different
from the old ".p" files (some difference are unavoidable, think about
the number of votes for a movies).


The other tool is the build_tests.py script: it takes every test
made by test_parser.py and create another script (based on the
skel_test.py skeleton) for every test; these test just read the
corresponding ".html" file and output the result of the parser.
E.g.: run build_tests.py and your current directory will be full
of "test_*" scripts.
Fetch the html pages with test_parser.py -f and try to run one
of the created "test_*" script.
As an example if you run ./test_airing_parser_m37.py you will see
that no titlesRefs and namesRefs are collected (correct!) and that
the only data parsed is "airing", followed by the value of these
data.
Then you _manually_ try to find if the parsed data are corrected:
you open the "m37.html" file with a browser and see if every
information that the airing_parser is supposed to parse is
correct.
In this specific case the parser already works perfectly; other
parser may be completely (or partially) broken.

Run these scripts one by one, and see what's not working; if you're
interested in fixing something, let me know on this list, to avoid
two persons working on the same bug.
Currently I'm looking at the test_guests_parser_m33.py (heavily
broken).


Enjoy,
-- 
Davide Alberani <[EMAIL PROTECTED]> [PGP KeyID: 0x465BFD47]
http://erlug.linux.it/~da/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

[Imdbpy-devel] development on new design

Reply via email to