Hi,
Curious, I tried this out, first creating some data in the file child.py:
#!/usr/bin/env python3
import lxml.etree as ET
t = ET.fromstring(
"""
<html>
<body>
<div data-flag="TODO">
<p>This content is flagged as <code>TODO</code>. It gets a background
colour (red, in this case) and the label <code>TODO</code> is rendered into the
document margin.</p>
</div>
</body>
</html>
""")
And then applied your query:
for n, i in enumerate(t.xpath(r'.//div/child::*')):
print(f"[{n}] {i}")
Which gave me the following result:
❯ ./child.py
[0] <Element p at 0x1023c3d00>
However, when experimenting, I put two forward slashes before the child
operator, and then I saw the output you describe, i.e:
for n, i in enumerate(t.xpath(r'.//div//child::*')):
print(f"[{n}] {i}")
Give:
❯ ./child.py
[0] <Element p at 0x10f2a3d40>
[1] <Element code at 0x10f2a3e40>
[2] <Element code at 0x10f2b4140>
(Although, as far as i can see, that output is correct when the two forward
slashes are present.)
The above was performed on Python 3.11.3 and lxml version 4.9.2.
Could a double slash have sneaked into your query before "child"...?
Kind regards
aid
> On 29 Jun 2023, at 19:17, wayneb--- via lxml - The Python XML Toolkit
> <[email protected]> wrote:
>
> Here's a bit of code I was trying to parse:
>
> <div data-flag="TODO">
> <p>This content is flagged as <code>TODO</code>. It gets a background
> colour (red, in this case) and the label <code>TODO</code> is rendered into
> the document margin.</p>
> </div>
>
> I was using .//div/child::*
> What I got back was a list of three items: <p><code><code>
> those aren't the children of div! Those are the descendant elements of div. I
> decided to test this using .//div/descendant::* which gave me the proper:
> <p><code><code> elements.
> To further confirm this I used other parsers and they provided the proper
> child (p in this case).
>
> How do we go about getting this fixed in lxml?
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: [email protected]
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]