On 08/20/2008 06:40 PM, Davide Alberani wrote:
> I'll do locations, tech, dvd, recommendation; I still have to check
> which are left out (I'll take others, once I'm done with these ones).
> 
> The big problems are: "awards" and - maybe - "episodes" and "airing".
> Let me know which one you'll take. :-)
> 

I'll take news, amazonrev, sales, movie_faqs, episodes and airing.

> I was thinking about your cache implementation: is it possible that
>  the speed up is not noticeable because lxml is already doing the
> same thing?

That's where I got the idea of how the beautifulsoup xpath support could
be made to perform better. I've only tried to speed up the beautifulsoup
side and did not touch anything lxml-related. There is no room for
improvement on the lxml side anyway :-)

> I don't know if beautifulsoup adopts a similar solution.

I think beautifulsoup does not play a part here because the cache is
only meant for the parsing of the path, not its application. I mean, if
a path like "//td[1]/text()" comes the first time, it tokenizes the path
into the steps ["//td[1]", "text()"], finds the node tests ("td" and
"text()"), generates callable filter classes will for predicates (like
for [1]) and caches the resulting path object. If the same path comes
again, the object will be taken from the cache. So basically only string
operations will be skipped in the case of a cache hit. When I run the
tests as I said, it looks like that 22000 such parses take no time.

> Anyway, I'd leave it in place (even if not noticeable, a local
> python dictionary is always faster).
> 

I'm probably overlooking something or my test methodology is wrong.
Anyway, I'll leave it in place and try to figure it out later, after
the parsers are completed.

Thanks

Turgut


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to