On Mar 03, Sébastien Ragons <srag...@gmail.com> wrote:

> todo tells that : a better test-suite is really needed.
> 
> I'd like to help but witch module, class, are the more urgent to
> test ?

The problem here is that even I don't know were to start. :-)
The test-suite covers many aspects (even if I'm sure something is
missing, in the test_* files in the main directory), and
http-mobile/test_parser.py tests most (probably all) the parsers
used by "http" and "mobile".

Unfortunately it's a real mess, so don't be surprised if you don't
understand what the code means - I have serious troubles, too. :-/

The other problem is that it's mostly a manual process; a normal
iteration works this way:
1. you fetch the .html files with -f
2. ASSUMING THAT ACTUALLY THE PARSERS ARE GOOD, you create the
   parsed dumps with the -p argument.
3. after some time, you fetch a new version of the .html files
   (always with -f)
4. you run the tests with -t (there're options to exclude certain
   tests); normally you want to redirect the output to a file or
   show it with: ./test_parser.py -t 2>&1 | less
   The test compare the output of the parsers applied to the
   newly downloaded .html files to the states stored in .p files (which
   are known to be "good").
5. now you look at the output to see if the changes are due to
   normal differences in the data (i.e.: the number of votes for a
   movie increased, a removed 'trivia' and so on), and try to
   spot the real problems.
   In the output, you can search for the KEYWORD string to highlight
   suspiciously missing keywords - but not every problem causes a
   missing keyword: maybe the keywords are there, but with garbage
   in their values.
6. you fix the broken parsers, if any.
7. you run again the tests; if they are all ok, you run -p again to
   start over from point 3.

To show the results of a single parser (and test its changes while
you're fixing it), you can populate the 'standalone' directory
using the build_tests.py script; after that, you move to the
'standalone' directory and run the script you want (it's in two
copies: one for lxml, of for BeautifulSoup).

As you can see, pretty complex and boring, but I don't really see
another way to proceed: we're following a moving target. :-/

If you want to start with something simpler, look at the test_helpers.py
file: it misses tests for the 'get_byURL' and 'fullSizeCoverURL'
functions.


Hope this helps - thank you for your interest!

-- 
Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to