Hi Brian & Charlie,

I'm not the OP; but, FYI,  i can see the same issue (on an Intel Mac):

aid@orac tmp % ./tail.py
Python              : sys.version_info(major=3, minor=9, micro=13, 
releaselevel='final', serial=0)
lxml.etree          : (4, 9, 0, 0)
libxml used         : (2, 9, 14)
libxml compiled     : (2, 9, 14)
libxslt used        : (1, 1, 35)
libxslt compiled    : (1, 1, 35)
b'<form action="action1">\n</form>\n</body>\n</html>\n'

You can see my machine is using lxml 2.9.14; which is a pity as in the thread 
you linked to it looked like the issue would have been resolved in that 
version...

However, I found that if you update the call to etree.tostring() to use 
method='html' then the trailing body and html elements are no longer shown.

i.e.:

print(etree.tostring(nodeList[0], method='html'))

With that update made, the script outputs the desired:

aid@orac tmp % python3 -i tail.py
Python              : sys.version_info(major=3, minor=9, micro=13, 
releaselevel='final', serial=0)
lxml.etree          : (4, 9, 0, 0)
libxml used         : (2, 9, 14)
libxml compiled     : (2, 9, 14)
libxslt used        : (1, 1, 35)
libxslt compiled    : (1, 1, 35)
b'<form action="action1">\n</form>\n'

I've no idea why this behaviour seems to have changed....

Kind regards

aid

> On 7 Jun 2022, at 17:02, Charlie Clark <charlie.cl...@clark-consulting.eu> 
> wrote:
> 
> On 7 Jun 2022, at 16:56, brian.b...@trustpayments.com 
> <mailto:brian.b...@trustpayments.com> wrote:
> 
> In more recent versions of lxml the tostring() method can return extra text 
> after the closing tag of the node I've passed to it. So instead of returning
> 
> b'\n\n'
> 
> it returns
> 
> b'\n\n\n\n'
> 
> This looks a lot like this 
> https://mail.python.org/archives/list/lxml@python.org/thread/LCTOSIIWGGALAMSZAYHRRYUWYDRESCUO/
>  
> <https://mail.python.org/archives/list/lxml@python.org/thread/LCTOSIIWGGALAMSZAYHRRYUWYDRESCUO/>
> Can you update your version of libxml2?
> 
> Charlie
> 
> --
> Charlie Clark
> Managing Director
> Clark Consulting & Research
> German Office
> Sengelsweg 34
> Düsseldorf
> D- 40489
> Tel: +49-203-3925-0390
> Mobile: +49-178-782-6226
> 
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- lxml@python.org
> To unsubscribe send an email to lxml-le...@python.org
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: a...@logic.org.uk

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to