On Mar 03, Sébastien Ragons <srag...@gmail.com> wrote: > todo tells that : a better test-suite is really needed. > > I'd like to help but witch module, class, are the more urgent to > test ?
The problem here is that even I don't know were to start. :-) The test-suite covers many aspects (even if I'm sure something is missing, in the test_* files in the main directory), and http-mobile/test_parser.py tests most (probably all) the parsers used by "http" and "mobile". Unfortunately it's a real mess, so don't be surprised if you don't understand what the code means - I have serious troubles, too. :-/ The other problem is that it's mostly a manual process; a normal iteration works this way: 1. you fetch the .html files with -f 2. ASSUMING THAT ACTUALLY THE PARSERS ARE GOOD, you create the parsed dumps with the -p argument. 3. after some time, you fetch a new version of the .html files (always with -f) 4. you run the tests with -t (there're options to exclude certain tests); normally you want to redirect the output to a file or show it with: ./test_parser.py -t 2>&1 | less The test compare the output of the parsers applied to the newly downloaded .html files to the states stored in .p files (which are known to be "good"). 5. now you look at the output to see if the changes are due to normal differences in the data (i.e.: the number of votes for a movie increased, a removed 'trivia' and so on), and try to spot the real problems. In the output, you can search for the KEYWORD string to highlight suspiciously missing keywords - but not every problem causes a missing keyword: maybe the keywords are there, but with garbage in their values. 6. you fix the broken parsers, if any. 7. you run again the tests; if they are all ok, you run -p again to start over from point 3. To show the results of a single parser (and test its changes while you're fixing it), you can populate the 'standalone' directory using the build_tests.py script; after that, you move to the 'standalone' directory and run the script you want (it's in two copies: one for lxml, of for BeautifulSoup). As you can see, pretty complex and boring, but I don't really see another way to proceed: we're following a moving target. :-/ If you want to start with something simpler, look at the test_helpers.py file: it misses tests for the 'get_byURL' and 'fullSizeCoverURL' functions. Hope this helps - thank you for your interest! -- Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel