On 08/20/2008 06:40 PM, Davide Alberani wrote: > I'll do locations, tech, dvd, recommendation; I still have to check > which are left out (I'll take others, once I'm done with these ones). > > The big problems are: "awards" and - maybe - "episodes" and "airing". > Let me know which one you'll take. :-) >
I'll take news, amazonrev, sales, movie_faqs, episodes and airing. > I was thinking about your cache implementation: is it possible that > the speed up is not noticeable because lxml is already doing the > same thing? That's where I got the idea of how the beautifulsoup xpath support could be made to perform better. I've only tried to speed up the beautifulsoup side and did not touch anything lxml-related. There is no room for improvement on the lxml side anyway :-) > I don't know if beautifulsoup adopts a similar solution. I think beautifulsoup does not play a part here because the cache is only meant for the parsing of the path, not its application. I mean, if a path like "//td[1]/text()" comes the first time, it tokenizes the path into the steps ["//td[1]", "text()"], finds the node tests ("td" and "text()"), generates callable filter classes will for predicates (like for [1]) and caches the resulting path object. If the same path comes again, the object will be taken from the cache. So basically only string operations will be skipped in the case of a cache hit. When I run the tests as I said, it looks like that 22000 such parses take no time. > Anyway, I'd leave it in place (even if not noticeable, a local > python dictionary is always faster). > I'm probably overlooking something or my test methodology is wrong. Anyway, I'll leave it in place and try to figure it out later, after the parsers are completed. Thanks Turgut ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel