[lxml] Re: XML namespaces are not propagated over from the ancestor elements when using find* methods

jholg--- via lxml - The Python XML Toolkit Mon, 13 Jan 2025 02:07:08 -0800

Hi,

> I have a XML document that has namespaces in it.  I want to use the find*
> methods to select elements but it's looks like it is not possible without
> specifying namespaces explicitly to every call.  Is this true?  Is this by
> design?  This seems very burdensome to do so when namespaces are included
> in the XML document.  It would be really nice if the namespaces in the XML
> document could be considered.


IMHO the way to go is to just predefine a prefix namespace mapping that fits the
prefixes you use in your find/xpath expressions, once. Then simply use this as
your xpath/find argument. E.g.

>>> from lxml import etree
>>> xml = b"""<?xml version="1.0" encoding="UTF-8"?>
... <root xmlns="urn:defaultnamespace">
...     <Timestamp>2025-01-09T17:46:08.766Z</Timestamp>
...     <namespaced xmlns:ns="urn:subns">
...         <rootdefault>text</rootdefault>
...         <ns:element>element</ns:element>
...     </namespaced>
... </root>
... """

>>> root = etree.fromstring(xml)
>>> # subsequently use this ns mapping:
>>> namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"}
>>> root.find("./default:namespaced/sub:element", namespaces=namespaces)
<Element {urn:subns}element at 0x7e14867f4dc0>

or

>>> root.xpath("./default:namespaced/sub:element", namespaces=namespaces)
[<Element {urn:subns}element at 0x7e14867f4dc0>]

Not burdensome in my book. ;-)

Note that I deliberately deviate from the prefixes used in the original docs
here, just for illustration.
So you don't really need to know about the prefixes used in the document you
want to process beforehand - but of course you need to know the qualified names
for your find/xpath expressions (i.e. "{namespace-uri}element-name" in Clark
notation).

For XPath, you can't use an empty prefix; see also https://lxml.de/
xpathxslt.html#namespaces-and-prefixes.

You might even want to "precompile" xpath expressions using etree.XPath,
like

>>> namespaces = {"default": "urn:defaultnamespace", "sub": "urn:subns"}
>>> find_element = etree.XPath("./default:namespaced/sub:element",
namespaces=namespaces)
>>> find_element(root)
[<Element {urn:subns}element at 0x7e14868127c0>]
>>>

If you really wanted to  you could do some functools.partial currying to
create your own namespace map-aware find functions:

>>> import functools
>>> find = functools.partial(root.__class__.find, namespaces={None:
"urn:defaultnamespace", "ns": "urn:subns"})
>>> find(root, "./namespaced/ns:element")
<Element {urn:subns}element at 0x7e148686cb80>

If you wanted to use unqualified names you could do s.th. like
>>> root.xpath("./*[local-name()='namespaced']/*[local-name()='element']")
[<Element {urn:subns}element at 0x7e14867f4dc0>]

But I wouldn't advise it: it's clunky in XPath 1.0 anyhow and has performance
implications.

I'd just go with the simplest option i.e. define and reuse a namespaces dict.

Best regards,
Holger






_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

[lxml] Re: XML namespaces are not propagated over from the ancestor elements when using find* methods

Reply via email to