On 08/12/2008 06:10 PM, H. Turgut Uyar wrote:
> At the risk of complicating the DOM parser, I've added an extractor
> grouping feature that should improve the parser performance. The aim is
> to eliminate repeated xpath lookups. Now, if the extractor has a 'group'
> attribute (which is supposed to be an xpath), that will be applied first
> to get a set of group elements together with a group key. Then the
> extractor path will be applied to get the final elements.
I think that I should clear up some point here: The extractor path will
be applied to the group element, so it can (should?) be relative to the
group element.
For example:
path="//[EMAIL PROTECTED]'_imdbpy']/a",
group="//[EMAIL PROTECTED]'_imdbpy']",
group_key="./h5/text()",
can also be written as:
group="//[EMAIL PROTECTED]'_imdbpy']",
group_key="./h5/text()",
path="./a",
I'm not sure if the first method might produce incorrect results or
cause extra work for the parser since it might traverse to other _imdbpy
div's as well (other than the div of the current group).
Turgut
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel