[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Holger.Joukl Thu, 03 Mar 2022 06:38:54 -0800

Hi,

Stefan wrote:
> Note that the content of the XML file that your code is designed to process 
> did not
> change at all. It's just that some entirely unrelated content was added, in a
> completely different and unrelated namespace. And it was just externally added
> to the input data, or maybe just some tiny portion it, without telling you or 
> your
> code about it. Especially in places with optional content, where different
> namespaces are already a little more common than elsewhere, this is fairly 
> likely
> to go unnoticed.
>
> I find this kind of behaviour dangerous enough to restrict the "magic" in the 
> API to
> what is easy to understand and predict.

Any magic namespace prefix-based lookup scheme can be dangerous in a similar 
vein IMHO:
E.g.

>>> root = objectify.fromstring("""
... <a:root xmlns:a="A" xmlns:b="B">
...   <a:x>1</a:x>
...   <b:x>2</b:x>
...   <x>3</x>
... </a:root>""")
>>> root.b_x  # fictitious ns-prefix-based lookup
2

If you now change one XML doc namespace prefix from xmls:b to xmlns:ns_b:

>>> root = objectify.fromstring("""
... <a:root xmlns:a="A" xmlns:ns_b="B">
...   <a:x>1</a:x>
...   <ns_b:x>2</ns_b:x>
...   <x>3</x>
... </a:root>""")
>>> root.b_x  # fictitious ns-prefix-based lookup
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/objectify.pyx", line 231, in 
lxml.objectify.ObjectifiedElement.__getattr__
  File "src/lxml/objectify.pyx", line 450, in lxml.objectify._lookupChildOrRaise
AttributeError: no such child: b_x

Again, the very same code would suddenly cease to work, while the XML document
remains semantically identical. You'd get an exception in the best case, or
silently ignore data in the worst case.

That aside:

Volker wrote:
> [...]
> Debugging becomes a great hassle if you are not able e.g. in your
> PyCharm IDE to navigate the XML tree your parser a currently processing.
> Even worse if some nodes do not seem to even exist.
> [...]
> It is not that I like a more convenient way to address the data. To
> address the data I use xpath. It is purely the fact that I cannot use
> the objectified data in a debugger while debugging, that drives me mad.

I admit I don’t fully understand the issue (I don't use PyCharm and don't know 
how
it presents objects in debugging). To me, it seems easy enough to just do s.th. 
like

>>> list(root.iterchildren())
[1, 2, 3]

or

>>> print(objectify.dump(root))  # see also objectify.enable_recursive_str()
{A}root = None [ObjectifiedElement]
    {A}x = 1 [IntElement]
    {B}x = 2 [IntElement]
    x = 3 [IntElement]

Does PyCharm use elem.__dict__ or dir(elem) to present an object's attributes
in debugging?
Then maybe a way to address OP's issue might be to populate elem.__dict__ not 
only with
element children from the same namespace but with all children while *still*
only attribute-lookup children from elem's namespace.

I.e. instead of
>>> root = objectify.fromstring("""
... <a:root xmlns:a="A">
...   <a:x>1</a:x>
...   <x>3</x>
... </a:root>""")
>>>
>>> root.__dict__
{'x': 1}

__dict__ would yield

>>> root.__dict__  # not how it works today!
{'{A}x': 1, '{}x': 3}

...making all children appear in e.g. dir(), keeping existing getattr behavior:

>>> root.a
1

Maybe this would lessen the "child visibility issue" in debugging?

A breaking change of course, making __dict__ usage more surprising and arguably 
more
"non-standard" compared to regular Python objects IMO, since they'd contain 
names
that are not valid Python identifiers.

A cursory glance over the implementation looks like this should be possible in 
theory.
But I'm rather not convinced we should do this.

Maybe the debugger/IDE can just be taught to give more helpful output?
All the information is there in the first place...

Holger

Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

[lxml] Re: python lxml.objectify gives no attribute access to gco:CharacterString node

Reply via email to