> Hi,
> Does anyone think this needs to be posted to the bug tracker?
>
> lxml seems to identify superscripts as an integer but then throws an 
> exception.
>
> Thanks
>
> Alex
>
>
> from lxml import objectify
> xml = """
> <types>
> <mysuperscript>²²²²²²²²²²</mysuperscript>
> </types>
> """
> doc = objectify.fromstring(xml)
> print(objectify.dump(doc))
>
>
> Traceback (most recent call last):
>   File “**********.py", line 11, in <module>
>    print(objectify.dump(doc))
>           ^^^^^^^^^^^^^^^^^^^
>   File "src/lxml/objectify.pyx", line 1521, in lxml.objectify.dump
>   File "src/lxml/objectify.pyx", line 1549, in lxml.objectify._dump
>   File "src/lxml/objectify.pyx", line 1526, in lxml.objectify._dump
>   File "src/lxml/objectify.pyx", line 646, in 
> lxml.objectify.NumberElement.__repr__
>   File "src/lxml/objectify.pyx", line 946, in lxml.objectify._parseNumber
> ValueError: invalid literal for int() with base 10: '²²²²²²²²²²'

Looks like a bug to me.

For reasons I don't yet understand, the int type check in objectify's type 
guesser
(see https://lxml.de/objectify.html#how-data-types-are-matched) does not fail 
for this input:

>>> objectify.getRegisteredTypes()
[PyType(int, IntElement), PyType(float, FloatElement), PyType(bool, 
BoolElement), PyType(long, IntElement), PyType(str, StringElement), 
PyType(NoneType, NoneElement), PyType(none, NoneElement)]
>>> objectify.getRegisteredTypes()[0]
PyType(int, IntElement)
>>> print(objectify.getRegisteredTypes()[0].type_check("222"))
None
>>> print(objectify.getRegisteredTypes()[0].type_check("²²²²²²²²²²"))   # 
>>> Should raise!
None
>>> print(objectify.getRegisteredTypes()[0].type_check("abcd"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "stringsource", line 67, in 
cfunc.to_py.__Pyx_CFunc_object____object___to_py.wrap
  File "src/lxml/objectify.pyx", line 1054, in lxml.objectify._checkInt
  File "src/lxml/objectify.pyx", line 1047, in lxml.objectify._checkNumber
ValueError
>>>

However:
>>> int("²²²²²²²²²²")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '²²²²²²²²²²'

Probably a bug in _checkNumber(): 
https://github.com/lxml/lxml/blob/d01872ccdf7e1e5e825b6c6292b43e7d27ae5fc4/src/lxml/objectify.pyx#L974

Best regards, Holger








Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to