Hi Keith,
> I have a directory structure and a GDML file as follows > <!DOCTYPE xml [ > <!ENTITY define SYSTEM "DC1_define.xml"> > <!ENTITY materials SYSTEM "DC1_materials.xml"> > <!ENTITY solids SYSTEM "DC1_solids.xml"> > <!ENTITY setup SYSTEM "DC1_setup.xml"> > <!ENTITY struct SYSTEM "DC1_struct.xml"> >]> > > <xml xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance > xsi:noNamespaceSchemaLocation=http://service-spi.web.cern.ch/service-spi/app/releases/GDML/schema/gdml.xsd>&define;&materials;&solids;&setup;&struct;</xml> > Where the files labelled with .xml type are not strictlly xml files as they > have no <xml> ...content ... </xml> just valid include contents > It is processed by > try: > from lxml import etree > parser = etree.XMLParser(resolve_entities=True) > root = etree.parse(filename, parser=parser) > # print('error log') > # print(parser.error_log) > > except ImportError: > > [...] > > This all works fine BUT > I now wish to implement other functions which process the files listed in the > Entities as individual xml files, Is there a way to achieve this? > or alternatively is there a way to use an actual xml file as a include. (not having actually tried out any of this, but) what comes to mind is: - Parse with resolve_enttities=False and then handle the unresolved entities yourself. You should be able to get at the entity references in the tree e.g. using tree.iter(etree.Entity). The entity definitions can be accessed through a tree's docinfo attribute: tree.docinfo.internalDTD.entities() - If you have control over the XML file, maybe switch to XInclude instead of entities to include separate content? Again, instead of automatically processing inclusions (through tree.xinclude()) iterate the include nodes yourself for full control of what you want to achieve. - lxml supports custom resolvers. Both the DTD and XInclude approaches might be combinable with such custom resolvers to hook into the regular mechanics and get more control over the actual (see https://lxml.de/resolvers.html). Note: you're probably aware that resolve_entities=True can be a security risk if applied to untrusted XML input. Handle with care. Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz. _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com