[lxml] lxml.objectify2: More fun, namespaces, pythonic

Dr. Volker Jaenisch Mon, 07 Mar 2022 15:27:50 -0800

Dear XLML Users!

I am developing lxml.objectify2(lxml.o2). Lxml.o2 has tree objectives:


 * making lxml more pythonic
 * introducing robust namespaced properties
 * making lxml more fun

*Following the old ways**
*

Imagine the following xml file.

xml_str = '''\

<obj:root xmlns:obj="objectified" xmlns:other="otherNS">
  <obj:c1 a1="A1" a2="A2" other:a3="A3">
    <obj:c2>0</obj:c2>
    <obj:c2>1</obj:c2>
    <obj:c2>2</obj:c2>
  </obj:c1>
  <obj:c1>
    <other:c2>3</other:c2>
    <other:c2>5</other:c2>
    <obj:c2>2</obj:c2>
  </obj:c1>
  <obj:c1>
  <other:c2>42</other:c2>
  </obj:c1>
</obj:root>'''

Please notice that the tags obj:c1 and obj/other:c2 are multiple childsof the same {ns}name.

Here a glance at the data processed by xlml.o (standard lxml.objectfy)from the PyCharm IDE perspective.


https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-23-02-33.png/image_view_fullscreen

You may notice that there is no multiplicity at all. lxml.o is quitelimited and not really pythonic. Therefore any Python-IDE will struggleswith a representation of lxml processed data.



*Following the new ways*

Let's use lxml.objectify2 instead.


from lxml.objectify2 import ObjectifiedElement2

obj2_lookup = ObjectifyElementClassLookup(tree_class=ObjectifiedElement2)

parser = etree.XMLParser()
parser.set_element_class_lookup(obj2_lookup)

node = etree.XML(xml_str, parser=parser)

A look from the PyCharm debugger into the data structure processed bylxml.o2:


https://backend.datenadler.de/kram/bildschirmfoto-vom-2022-03-07-22-34-10.png/image_view_fullscreen

As you can see lxml.o2 handles multiple children with same qtag byassigning an "[index]" to them.

*<rant>Yeah, that is nice screenwork, but this will never work incode?**</rant>*


>>> node.obj_c1[2].obj_c2
[3]

here the call to

node.obj_c1

returns a list. Then python takes over get the desired second element.


*<rant>Ok, but this will not work with getattr**</rant>*

>>> getattr(node, 'obj_c1[0]').obj_c2

[0, 1, 2]

Here lxml.o2 does the selection of the element [0] really fast in c-space.


**

*<rant>OK, and where is the catch**</rant>*

To implement this functionality we need to ensure that two rules arefollowed by the user.

1) If there are elements without a namespace, a default namespace has tobe defined.

2) Any access to a "tag" has to be done qualified, with the exception ofthe default namespace.


node.<namespace>_<name>

    mit default namespace

node.<name>


If these rules a too much for you, go  back to lxml.objectify and be happy.


*<rant>Ah, go away. Where do you find such nice XML**</rant>*

Mh. I have never seen so simple XML documents like in the lxml.objectifytests in the real world.


But I am aware that lxml.o2 will have to be tested thoroughly.

*
*

*<rant>You will never convince all the users of lxml to change tolxml.o2**</rant>*

That is true. But I do not even try. lxml.o2 is an alternative to lxml.ofor certain usecases.



You are welcome to rant at me :-)

You are also welcome to help with the development of lxml.o2. This is aspare time job for me.

If you do not have the time to help, you may express your liking oflxml.o2, here.



lxml.o2 lives at

https://github.com/Inqbus/lxml <https://github.com/Inqbus/lxml>

in the branch

https://github.com/Inqbus/lxml/tree/objectify_prefix


Cheers,

Volker








--
=========================================================
   inqbus Scientific Computing    Dr.  Volker Jaenisch
   Hungerbichlweg 3               +49 (8860) 9222 7 92
   86977 Burggenhttps://inqbus.de
=========================================================

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

[lxml] lxml.objectify2: More fun, namespaces, pythonic

Reply via email to