> Hi, > Does anyone think this needs to be posted to the bug tracker? > > lxml seems to identify superscripts as an integer but then throws an > exception. > > Thanks > > Alex > > > from lxml import objectify > xml = """ > <types> > <mysuperscript>²²²²²²²²²²</mysuperscript> > </types> > """ > doc = objectify.fromstring(xml) > print(objectify.dump(doc)) > > > Traceback (most recent call last): > File “**********.py", line 11, in <module> > print(objectify.dump(doc)) > ^^^^^^^^^^^^^^^^^^^ > File "src/lxml/objectify.pyx", line 1521, in lxml.objectify.dump > File "src/lxml/objectify.pyx", line 1549, in lxml.objectify._dump > File "src/lxml/objectify.pyx", line 1526, in lxml.objectify._dump > File "src/lxml/objectify.pyx", line 646, in > lxml.objectify.NumberElement.__repr__ > File "src/lxml/objectify.pyx", line 946, in lxml.objectify._parseNumber > ValueError: invalid literal for int() with base 10: '²²²²²²²²²²'
Looks like a bug to me. For reasons I don't yet understand, the int type check in objectify's type guesser (see https://lxml.de/objectify.html#how-data-types-are-matched) does not fail for this input: >>> objectify.getRegisteredTypes() [PyType(int, IntElement), PyType(float, FloatElement), PyType(bool, BoolElement), PyType(long, IntElement), PyType(str, StringElement), PyType(NoneType, NoneElement), PyType(none, NoneElement)] >>> objectify.getRegisteredTypes()[0] PyType(int, IntElement) >>> print(objectify.getRegisteredTypes()[0].type_check("222")) None >>> print(objectify.getRegisteredTypes()[0].type_check("²²²²²²²²²²")) # >>> Should raise! None >>> print(objectify.getRegisteredTypes()[0].type_check("abcd")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "stringsource", line 67, in cfunc.to_py.__Pyx_CFunc_object____object___to_py.wrap File "src/lxml/objectify.pyx", line 1054, in lxml.objectify._checkInt File "src/lxml/objectify.pyx", line 1047, in lxml.objectify._checkNumber ValueError >>> However: >>> int("²²²²²²²²²²") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '²²²²²²²²²²' Probably a bug in _checkNumber(): https://github.com/lxml/lxml/blob/d01872ccdf7e1e5e825b6c6292b43e7d27ae5fc4/src/lxml/objectify.pyx#L974 Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz. _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com