Re: [Imdbpy-devel] Extractor grouping in DOM parser

H. Turgut Uyar Wed, 20 Aug 2008 06:12:09 -0700

On 08/20/2008 02:28 PM, Davide Alberani wrote:
> By the way, the "group" is a really good solution! :-)
>


Thanks, I think it must've sped up the process a lot.

Similarly, to speed up the bsoup parser, I've implemented a simple
utility that will prevent parsing paths and steps repeatedly. It keeps a
dictionary of parsed path and step objects and if it finds one there
that object will be returned instead of creating a new one by parsing
the path again.

I've run a subset of the current test suite (with 174 tests, of which
only one third is using the parsers) and I've seen this:

When no sharing is used, 24306 path parses are reported. When shared as
explained above, 24211 path hits and 95 path misses are reported. I
guess that means that there are 95 distinct paths in the suite. For the
steps there are 148 hits and 92 misses. The path hit ratio is great and
the time to parse paths must be really decreased but the problem is,
that doesn't reflect in the total running time of the tests. No
significant time difference at all! Either I'm doing something wrong or
parsing the paths takes a negligible time in the whole process but I'm
having a hard time how this is possible. Any ideas?

Turgut


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] Extractor grouping in DOM parser

Reply via email to