Hi, Curious, I tried this out, first creating some data in the file child.py:
#!/usr/bin/env python3 import lxml.etree as ET t = ET.fromstring( """ <html> <body> <div data-flag="TODO"> <p>This content is flagged as <code>TODO</code>. It gets a background colour (red, in this case) and the label <code>TODO</code> is rendered into the document margin.</p> </div> </body> </html> """) And then applied your query: for n, i in enumerate(t.xpath(r'.//div/child::*')): print(f"[{n}] {i}") Which gave me the following result: ❯ ./child.py [0] <Element p at 0x1023c3d00> However, when experimenting, I put two forward slashes before the child operator, and then I saw the output you describe, i.e: for n, i in enumerate(t.xpath(r'.//div//child::*')): print(f"[{n}] {i}") Give: ❯ ./child.py [0] <Element p at 0x10f2a3d40> [1] <Element code at 0x10f2a3e40> [2] <Element code at 0x10f2b4140> (Although, as far as i can see, that output is correct when the two forward slashes are present.) The above was performed on Python 3.11.3 and lxml version 4.9.2. Could a double slash have sneaked into your query before "child"...? Kind regards aid > On 29 Jun 2023, at 19:17, wayneb--- via lxml - The Python XML Toolkit > <lxml@python.org> wrote: > > Here's a bit of code I was trying to parse: > > <div data-flag="TODO"> > <p>This content is flagged as <code>TODO</code>. It gets a background > colour (red, in this case) and the label <code>TODO</code> is rendered into > the document margin.</p> > </div> > > I was using .//div/child::* > What I got back was a list of three items: <p><code><code> > those aren't the children of div! Those are the descendant elements of div. I > decided to test this using .//div/descendant::* which gave me the proper: > <p><code><code> elements. > To further confirm this I used other parsers and they provided the proper > child (p in this case). > > How do we go about getting this fixed in lxml? > _______________________________________________ > lxml - The Python XML Toolkit mailing list -- lxml@python.org > To unsubscribe send an email to lxml-le...@python.org > https://mail.python.org/mailman3/lists/lxml.python.org/ > Member address: a...@logic.org.uk
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com