Status: New
Owner: ----
New issue 116 by [email protected]: treewalker escapes from subtree if
root of subtree has a next sibling
http://code.google.com/p/html5lib/issues/detail?id=116
I'm using html5lib 0.11.1 with Python 2.5 on Mac OS X 10.5.
Consider the following interaction with html5lib:
>>> from html5lib import html5parser, serializer, treebuilders, treewalkers
>>> s = serializer.htmlserializer.HTMLSerializer()
>>> walker = treewalkers.getTreeWalker('dom')
>>> def contents(node):
... """Return the serialized content of 'node'."""
... return u''.join(s.serialize(walker(node)))
...
>>> doc = html5parser.HTMLParser(tree =
treebuilders.getTreeBuilder('dom')).parse(u'<table><tr><td>A</table>B')
>>> contents(doc.getElementsByTagName('table')[0]) # [1]
u'<table><tr><td>A</table>B'
>>> contents(doc.getElementsByTagName('tr')[0]) # [2]
u'<tr><td>A'
The output from [2] is what I expect to see: the serialized content of the
<tr> node and its
children.
However, the output from [1] seems wrong to me. I expected to get the
serialized content of
the <table> node (only), but instead I get the serialized content of the
<table> node plus the
remainder of the document.
I believe the underlying cause of the problem is the __iter__ method of
NonRecursiveTreeWalker
in html5lib/treewalkers/_base.py. It aims to walk the nodes of the subtree
of self.tree in prefix
order, and is supposed to stop when it returns to to the root of the
subtree (see the
comparison "if self.tree is currentNode" on line 153). However, the code
for stepping to the
next sibling is executed before this stopping test, causing the traversal
to escape from the
subtree (but only if the root of the subtree actually has a next sibling).
Suggested fix: exchange the step to the next sibling and the stopping test.
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---