Hi,

> I know how to set up the parser to not download entities. But I have not 
> found a way to stop XMLCatalog from downloading other xsd's than the root xsd.
>
> from lxml import etree
> parser = etree.XMLParser(no_network=True)
> xsddoc = etree.parse('schemas/ler/2.0_ler.xsd',parser=parser)
> xsd = etree.XMLSchema(xsddoc)
>
> The above code will recursively download the XSD's imported in 2.0_ler.xsd.
>
> I played around with xmllint and I believe that if XML_PARSE_NONET is True, 
> if will not download those. But how do I set that option for the context in 
> which XMLSchema runs?

Hm, from a quick glance at the code XML_PARSE_NONET *is* set through the 
no_network parser __init__option:

        if not no_network:
            parse_options = parse_options ^ xmlparser.XML_PARSE_NONET

(https://github.com/lxml/lxml/blob/3ccc7d583e325ceb0ebdf8fc295bbb7fc8cd404d/src/lxml/parser.pxi#L1626-L1627)

And it defaults to True, too.

That said, maybe you could use custom XML Catalog setup 
(https://lxml.de/resolvers.html#xml-catalogs, see also the link to libxml2
catalog setup info there) to prevent unwanted network lookup?
Might even be that some default catalog handling is taking place on your 
machine and causing the behavior
you observe(?), see 
https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Catalog-support#how-to-tune-catalog-usage.
E.g. could the imported documents have already been cached, i.e. they're not 
even loaded from remote? This is
one thing XML catalogs can provide.


Best regards,
Holger






Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to