Tuens out there's no need to use the pathlib module: The issue with "
" is gone when 1) first reading HTML into a variable 2) before parsing it, even with the standard open():

============
""" OK
from pathlib import Path
with Path(f).open() as tempfile:
    tree = et.parse(tempfile, parser=parser)
"""

#BAD 
#tree = et.parse(f,parser)

#OK
with open(f) as reader:
    content = reader.read()
#BAD tree=et.fromstring(content)
tree  = et.parse(content, parser)
============

I didn't think about calling parse() with a variable since the examples I read so far used either parse() with a file handler or the fromstring().

Thank you.

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to