Stefan Behnel schrieb am 06.07.25 um 08:38:
Austin Matherne schrieb am 01.07.25 um 04:01:
I’m upgrading a project from lxml 5.4.0 to the newly released lxml 6.0.0
and encountering an unexpected XMLSchemaParseError. I’ve distilled the
problem into a minimal, self-contained example and uploaded it as a
GitHub gist:
https://gist.github.com/AustinMatherne/533a4b6a31a63e11bfd8c09c03c05183
* The same XML and XSD files parse and schema validate cleanly with lxml
5.4.0.
* With lxml 6.0.0, calling XMLSchema() raises an XMLSchemaParseError with
no obvious culprit.
Is this a bug in libxml, lxml, or am I doing something unsupported with
the API?
So, I added a print(system_url) to your resolver and where the working
version downloads a whole pack of schema files transitively, the failing
version only gives the following output:
"""
READ http://www.w3.org/2001/xml.xsd
READ http://www.xbrl.org/2013/inlineXBRL/xhtml-inlinexbrl-1_1-modules.xsd
Traceback (most recent call last):
File "/home/stefan/source/Python/lxml/lxml-hg/TEST/
schema_error_ml_20250701/lxml.test.py", line 45, in <module>
schema = etree.XMLSchema(schema_tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/lxml/xmlschema.pxi", line 90, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: Invalid argument, line 1, column 37
"""
First of all, I highly recommend setting up XML catalogues on your system
to avoid downloading the schemas over and over again. It's really a lot of
useless network back and forth, server usage, waiting time etc. going on
here that can be avoided entirely by installing local copies of the
schemas. libxml2 will search the usual system directories automatically
when asked to use a schema and thus avoid any network traffic.
Then, it seems to fail immediately at the first included schema file, at a
suspicious position of 37 characters, which is right after the XML
declaration. That hints more at something going wrong in libxml2 than lxml
but is so surprisingly obviously not working that it's unlikely to go
undetected in libxml2 releases. I recommend bringing this to the attention
of the libxml2 developers.
Actually, it *was* something that lxml can resolve on its own side. libxml2
got a new API for passing data from resolvers into the parser and lxml
didn't use that yet but had to resort to some manual setup that apparently
no longer works in libxml2 2.14+.
There is a test for this, so I'm not sure why it didn't fail when switching
to libxml2 2.14, but in any case, I pushed a fix to the 6.0 branch that
resolves it on my side:
https://github.com/lxml/lxml/commit/2aae3a9625fcb858f83715a81b4d7182d2529a09
I'll release a bug fix version soon.
Stefan
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: arch...@mail-archive.com