Hi, At the risk of complicating the DOM parser, I've added an extractor grouping feature that should improve the parser performance. The aim is to eliminate repeated xpath lookups. Now, if the extractor has a 'group' attribute (which is supposed to be an xpath), that will be applied first to get a set of group elements together with a group key. Then the extractor path will be applied to get the final elements. If the attribute key is None, the group key will be the key. Here's an example using the old method:
Extractor(label='glossarysections', path="//[EMAIL PROTECTED]'glossary']/../../../..//tr", attrs=Attribute(key="../tr[1]/td/h5/a/@name", multi=True, ...), This method unnecessarily applies the attribute key xpath over and over. Especially large pages like the person keywords page become terribly slow because of this. The new method is: Extractor(label='glossarysections', group="//[EMAIL PROTECTED]'glossary']", group_key="./@name", path="../../../..//tr", attrs=Attribute(key=None, multi=True, ...), The group_key is ".//text()" by default. I have reviewed the existing parsers to take advantage of this, but the old method is also still valid. As a note, I think the markup of the person genres/keywords page has changed and the old parser is not working correctly. Turgut ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel