Hi Keith,

> I have a directory structure and a GDML file as follows
> <!DOCTYPE xml [
> <!ENTITY define SYSTEM "DC1_define.xml">
> <!ENTITY materials SYSTEM "DC1_materials.xml">
> <!ENTITY solids SYSTEM "DC1_solids.xml">
> <!ENTITY setup SYSTEM "DC1_setup.xml">
> <!ENTITY struct SYSTEM "DC1_struct.xml">
>]>
>
> <xml xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance 
> xsi:noNamespaceSchemaLocation=http://service-spi.web.cern.ch/service-spi/app/releases/GDML/schema/gdml.xsd>&define;&materials;&solids;&setup;&struct;</xml>
> Where the files labelled with .xml type are not strictlly xml files as they 
> have no <xml> ...content ... </xml> just valid include contents
> It is processed by
> try:
>        from lxml import etree
>        parser = etree.XMLParser(resolve_entities=True)
>        root = etree.parse(filename, parser=parser)
>        # print('error log')
>        # print(parser.error_log)
>
>    except ImportError:
>
> [...]
>
> This all works fine BUT
> I now wish to implement other functions which process the files listed in the 
> Entities as individual xml files, Is there a way to achieve this?
> or alternatively is there a way to use an actual xml file as a include.


(not having actually tried out any of this, but) what comes to mind is:

- Parse with resolve_enttities=False and then handle the unresolved entities 
yourself.
  You should be able to get at the entity references in the tree e.g. using 
tree.iter(etree.Entity).
  The entity definitions can be accessed through a tree's docinfo attribute:
  tree.docinfo.internalDTD.entities()

- If you have control over the XML file, maybe switch to XInclude instead of 
entities to include
  separate content? Again, instead of automatically processing inclusions 
(through tree.xinclude())
  iterate the include nodes yourself for full control of what you want to 
achieve.

- lxml supports custom resolvers. Both the DTD and XInclude approaches might be 
combinable
  with such custom resolvers to hook into the regular mechanics and get
  more control over the actual (see https://lxml.de/resolvers.html).

Note: you're probably aware that resolve_entities=True can be a security risk 
if applied to
untrusted XML input. Handle with care.

Best regards,
Holger






Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to