My bad, I meant input.buf and input->buf, not index->buf.

Best,
Abe

On Sun, Jul 6, 2025 at 9:11 AM Abraham Polk <abep...@gmail.com> wrote:

> Hi all,
>
> Thanks Stefan! I also looked into this, and it appears that index.buf or
> index->buf (C) is not getting set on the lxml side or the libxml2 side.
> It looks like the call on line 487 of parser.pxi, c_input =
> xmlparser.xmlNewInputStream(c_context), calls a deprecated (since at
> least 11 months ago) function in libxml2's parserInternals.c. So Stefan's
> fix probably just updates lxml to use the updated libxml2 API, which
> *does *set buf.
>
> For those who want more details:
> It was probably deprecated because of the new functions starting with
> xmlNewInputFrom. The containing function in lxml, _local_resolver, is
> passed into libxml2's xmlSetExternalEntityLoader in _register_document_loader.
> xmlSetExternalEntityLoader itself replaces xmlDefaultExternalEntityLoader
> with a custom callback. For reference, xmlDefaultExternalEntityLoader
> *does* in fact set the input->buf. If you follow a few function calls
> down to xmlNewInputFromUrl, there is a call to
> xmlParserInputBufferCreateUrl, which creates the buffer. However, the
> calls in lxml and the deprecated function leave the buffer as NULL.
>
> Best,
> Abe
>
> On Sun, Jul 6, 2025 at 3:59 AM Stefan Behnel via lxml - The Python XML
> Toolkit <lxml@python.org> wrote:
>
>> Stefan Behnel schrieb am 06.07.25 um 08:38:
>> > Austin Matherne schrieb am 01.07.25 um 04:01:
>> >> I’m upgrading a project from lxml 5.4.0 to the newly released lxml
>> 6.0.0
>> >> and encountering an unexpected XMLSchemaParseError. I’ve distilled the
>> >> problem into a minimal, self-contained example and uploaded it as a
>> >> GitHub gist:
>> >>
>> >>
>> https://gist.github.com/AustinMatherne/533a4b6a31a63e11bfd8c09c03c05183
>> >>
>> >> * The same XML and XSD files parse and schema validate cleanly with
>> lxml
>> >> 5.4.0.
>> >> * With lxml 6.0.0, calling XMLSchema() raises an XMLSchemaParseError
>> with
>> >> no obvious culprit.
>> >>
>> >> Is this a bug in libxml, lxml, or am I doing something unsupported
>> with
>> >> the API?
>> >
>> > So, I added a print(system_url) to your resolver and where the working
>> > version downloads a whole pack of schema files transitively, the
>> failing
>> > version only gives the following output:
>> >
>> > """
>> > READ http://www.w3.org/2001/xml.xsd
>> > READ
>> http://www.xbrl.org/2013/inlineXBRL/xhtml-inlinexbrl-1_1-modules.xsd
>> > Traceback (most recent call last):
>> >    File "/home/stefan/source/Python/lxml/lxml-hg/TEST/
>> > schema_error_ml_20250701/lxml.test.py", line 45, in <module>
>> >      schema = etree.XMLSchema(schema_tree)
>> >               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >    File "src/lxml/xmlschema.pxi", line 90, in
>> lxml.etree.XMLSchema.__init__
>> > lxml.etree.XMLSchemaParseError: Invalid argument, line 1, column 37
>> > """
>> >
>> > First of all, I highly recommend setting up XML catalogues on your
>> system
>> > to avoid downloading the schemas over and over again. It's really a lot
>> of
>> > useless network back and forth, server usage, waiting time etc. going
>> on
>> > here that can be avoided entirely by installing local copies of the
>> > schemas. libxml2 will search the usual system directories automatically
>> > when asked to use a schema and thus avoid any network traffic.
>> >
>> > Then, it seems to fail immediately at the first included schema file,
>> at a
>> > suspicious position of 37 characters, which is right after the XML
>> > declaration. That hints more at something going wrong in libxml2 than
>> lxml
>> > but is so surprisingly obviously not working that it's unlikely to go
>> > undetected in libxml2 releases. I recommend bringing this to the
>> attention
>> > of the libxml2 developers.
>>
>> Actually, it *was* something that lxml can resolve on its own side.
>> libxml2
>> got a new API for passing data from resolvers into the parser and lxml
>> didn't use that yet but had to resort to some manual setup that
>> apparently
>> no longer works in libxml2 2.14+.
>>
>> There is a test for this, so I'm not sure why it didn't fail when
>> switching
>> to libxml2 2.14, but in any case, I pushed a fix to the 6.0 branch that
>> resolves it on my side:
>>
>>
>> https://github.com/lxml/lxml/commit/2aae3a9625fcb858f83715a81b4d7182d2529a09
>>
>> I'll release a bug fix version soon.
>>
>> Stefan
>>
>> _______________________________________________
>> lxml - The Python XML Toolkit mailing list -- lxml@python.org
>> To unsubscribe send an email to lxml-le...@python.org
>> https://mail.python.org/mailman3//lists/lxml.python.org
>> Member address: abep...@gmail.com
>>
>
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: arch...@mail-archive.com

Reply via email to