codecompl...@free.fr schrieb am 02.09.21 um 16:53:
I'm still learning about lxml, and was wondering if there's a way to get the 
tree from the root to avoid writing the file to disk before re-reading it just 
for that:

INPUTFILE = "input.kml"

#get rid of NS
with open(INPUTFILE) as reader:
        content = reader.read()
content= re.sub('<kml.*?>', '<kml>', content,0, re.DOTALL)

If you really want the namespace declarations stripped out, I'd rather do it after parsing, not before. (In fact, I would not do it at all, but you seem to be inclined to do it, for some reason.) Here, you are relying on specific syntax being used for them, which may or may not be the case in a given document.


#Read from memory to avoid writing cleaned file to disk and re-read
parser = et.XMLParser(remove_blank_text=True)
root = et.fromstring(bytes(content, encoding='utf8'), parser)
#NameError: name 'tree' is not defined
r = tree.xpath('/Document/name')
print(r[0].tag)

You do not need an ElementTree instance for this. Just use the root element.

I recommend using the find/findall/iterfind() methods over using xpath(), though. They are faster and support incremental searches. And they simplify namespace usage.

Stefan
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to