Hi Holger!
Thank you very much for the fast response.
Am 28.02.22 um 08:41 schrieb holger.jo...@lbbw.de:
The reason for this is that obviously
{http://www.isotc211.org/2005/gco}CharacterString is not a valid Python
identifier and it makes sense
to restrict unqualified lookup to children from the same namespace.
I like to disagree on
and it makes sense
to restrict unqualified lookup to children from the same namespace
What does the namespace of a node has in common with the namespace of
one of its subnodes? Nothing. It is quite common in XML that you borrow
from other namespaces.
Other namespace based python libs like for instance RDFlib solve this
problem generically by adding the namespace to the python property.
{http://www.isotc211.org/2005/gco}CharacterString -> gco_CharacterString
This works like a charm. Not once I had a corner-case.
The problem lies deeply burrowed in the nature of LXML objectify
implementation. Objectify does not really transform the XML into a real python
instance hierarchy (as RDFlib does), but directs all attribute access via
function calls to the C-libxml core. This is on one hand a desired behavior
since one so can change XML on-the-fly and some of the changes are visible as
well in the XML as also in the objectified representation.
But on the other hand the information what namespace a node belongs to is not
persistent in the node and therefore cannot be used for lookup.
This can easily be seen in lxml/objectivy.pyx line 414ff:
cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node,
const_xmlChar* href, const_xmlChar*
name,
Py_ssize_t index):
cdef tree.xmlNode* (*next)(tree.xmlNode*)
if index >= 0:
next = cetree.nextElement
else:
index = -1 - index
next = cetree.previousElement
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
_tagMatches(c_node, href, name):
index = index - 1
if index < 0:
return c_node
c_node = next(c_node)
return NULL
To find the desired sibling the code loops over all childern and matches
(parentNamespace, propertyName) against them.
The correct operation of _findFollowingSibling should IMHO be:
Make a lookup on all children (with the python property name only). If
one match is found then return this match. If none or more than one
match is found then no answer is possible.
I extended _findFollowingSibling to
cdef tree.xmlNode* _findFollowingSibling(tree.xmlNode* c_node,
const_xmlChar* href,
const_xmlChar* name,
Py_ssize_t index):
cdef tree.xmlNode* (*next)(tree.xmlNode*)
cdef tree.xmlNode* start_node
cdef tree.xmlNode* result_node
cdef int found = 0
start_node = c_node
if index >= 0:
next = cetree.nextElement
else:
index = -1 - index
next = cetree.previousElement
# search with namespace
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and \
_tagMatches(c_node, href, name):
index = index - 1
if index < 0:
return c_node
c_node = next(c_node)
# search without namespace
c_node = start_node
while c_node is not NULL:
if c_node.type == tree.XML_ELEMENT_NODE and c_node.name == name:
index = index - 1
if index < 0:
result_node = c_node
found += 1
c_node = next(c_node)
# check if only one result is found
if found == 1:
return result_node
return NULL
Sorry for my clumsy Cython. But it works perfectly well. I also
preserved the notion to look up in the parent namespace first.
>>> node.fileIdentifier.CharacterString
'4157d397-e2c3-4e6e-8a84-0712aa9c1162'
I would really like if someone may test thishttps://github.com/Inqbus/lxml Branch*better-objectify-attributes
<https://github.com/Inqbus/lxml/tree/better-objectify-attributes> *proof of concept.
When getting positive answers I would come up with a pull request.
Cheers,
Volker
--
=========================================================
inqbus Scientific Computing Dr. Volker Jaenisch
Hungerbichlweg 3 +49 (8860) 9222 7 92
86977 Burggenhttps://inqbus.de
=========================================================
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com