My bad, I meant input.buf and input->buf, not index->buf. Best, Abe
On Sun, Jul 6, 2025 at 9:11 AM Abraham Polk <abep...@gmail.com> wrote: > Hi all, > > Thanks Stefan! I also looked into this, and it appears that index.buf or > index->buf (C) is not getting set on the lxml side or the libxml2 side. > It looks like the call on line 487 of parser.pxi, c_input = > xmlparser.xmlNewInputStream(c_context), calls a deprecated (since at > least 11 months ago) function in libxml2's parserInternals.c. So Stefan's > fix probably just updates lxml to use the updated libxml2 API, which > *does *set buf. > > For those who want more details: > It was probably deprecated because of the new functions starting with > xmlNewInputFrom. The containing function in lxml, _local_resolver, is > passed into libxml2's xmlSetExternalEntityLoader in _register_document_loader. > xmlSetExternalEntityLoader itself replaces xmlDefaultExternalEntityLoader > with a custom callback. For reference, xmlDefaultExternalEntityLoader > *does* in fact set the input->buf. If you follow a few function calls > down to xmlNewInputFromUrl, there is a call to > xmlParserInputBufferCreateUrl, which creates the buffer. However, the > calls in lxml and the deprecated function leave the buffer as NULL. > > Best, > Abe > > On Sun, Jul 6, 2025 at 3:59 AM Stefan Behnel via lxml - The Python XML > Toolkit <lxml@python.org> wrote: > >> Stefan Behnel schrieb am 06.07.25 um 08:38: >> > Austin Matherne schrieb am 01.07.25 um 04:01: >> >> I’m upgrading a project from lxml 5.4.0 to the newly released lxml >> 6.0.0 >> >> and encountering an unexpected XMLSchemaParseError. I’ve distilled the >> >> problem into a minimal, self-contained example and uploaded it as a >> >> GitHub gist: >> >> >> >> >> https://gist.github.com/AustinMatherne/533a4b6a31a63e11bfd8c09c03c05183 >> >> >> >> * The same XML and XSD files parse and schema validate cleanly with >> lxml >> >> 5.4.0. >> >> * With lxml 6.0.0, calling XMLSchema() raises an XMLSchemaParseError >> with >> >> no obvious culprit. >> >> >> >> Is this a bug in libxml, lxml, or am I doing something unsupported >> with >> >> the API? >> > >> > So, I added a print(system_url) to your resolver and where the working >> > version downloads a whole pack of schema files transitively, the >> failing >> > version only gives the following output: >> > >> > """ >> > READ http://www.w3.org/2001/xml.xsd >> > READ >> http://www.xbrl.org/2013/inlineXBRL/xhtml-inlinexbrl-1_1-modules.xsd >> > Traceback (most recent call last): >> > File "/home/stefan/source/Python/lxml/lxml-hg/TEST/ >> > schema_error_ml_20250701/lxml.test.py", line 45, in <module> >> > schema = etree.XMLSchema(schema_tree) >> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> > File "src/lxml/xmlschema.pxi", line 90, in >> lxml.etree.XMLSchema.__init__ >> > lxml.etree.XMLSchemaParseError: Invalid argument, line 1, column 37 >> > """ >> > >> > First of all, I highly recommend setting up XML catalogues on your >> system >> > to avoid downloading the schemas over and over again. It's really a lot >> of >> > useless network back and forth, server usage, waiting time etc. going >> on >> > here that can be avoided entirely by installing local copies of the >> > schemas. libxml2 will search the usual system directories automatically >> > when asked to use a schema and thus avoid any network traffic. >> > >> > Then, it seems to fail immediately at the first included schema file, >> at a >> > suspicious position of 37 characters, which is right after the XML >> > declaration. That hints more at something going wrong in libxml2 than >> lxml >> > but is so surprisingly obviously not working that it's unlikely to go >> > undetected in libxml2 releases. I recommend bringing this to the >> attention >> > of the libxml2 developers. >> >> Actually, it *was* something that lxml can resolve on its own side. >> libxml2 >> got a new API for passing data from resolvers into the parser and lxml >> didn't use that yet but had to resort to some manual setup that >> apparently >> no longer works in libxml2 2.14+. >> >> There is a test for this, so I'm not sure why it didn't fail when >> switching >> to libxml2 2.14, but in any case, I pushed a fix to the 6.0 branch that >> resolves it on my side: >> >> >> https://github.com/lxml/lxml/commit/2aae3a9625fcb858f83715a81b4d7182d2529a09 >> >> I'll release a bug fix version soon. >> >> Stefan >> >> _______________________________________________ >> lxml - The Python XML Toolkit mailing list -- lxml@python.org >> To unsubscribe send an email to lxml-le...@python.org >> https://mail.python.org/mailman3//lists/lxml.python.org >> Member address: abep...@gmail.com >> >
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3//lists/lxml.python.org Member address: arch...@mail-archive.com