On 08/12/2008 06:10 PM, H. Turgut Uyar wrote:
> At the risk of complicating the DOM parser, I've added an extractor 
> grouping feature that should improve the parser performance. The aim is 
> to eliminate repeated xpath lookups. Now, if the extractor has a 'group' 
> attribute (which is supposed to be an xpath), that will be applied first 
> to get a set of group elements together with a group key. Then the 
> extractor path will be applied to get the final elements. 

I think that I should clear up some point here: The extractor path will
be applied to the group element, so it can (should?) be relative to the
group element.

For example:

                    path="//[EMAIL PROTECTED]'_imdbpy']/a",
                    group="//[EMAIL PROTECTED]'_imdbpy']",
                    group_key="./h5/text()",

can also be written as:

                    group="//[EMAIL PROTECTED]'_imdbpy']",
                    group_key="./h5/text()",
                    path="./a",

I'm not sure if the first method might produce incorrect results or
cause extra work for the parser since it might traverse to other _imdbpy
div's as well (other than the div of the current group).

Turgut


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to