On 8 Jun 2023, at 9:10, Jamie Norrish wrote:

> I would approach this by first transforming each document into a
> simpler structure, using XSLT. If you do not care about anything other
> than tei:p, tei:w, and tei:sc elements, and for all of the latter two
> to be children of the former, then your transform can go find all tei:p
> (and any other containing elements you might have) and output them, and
> then all descendant tei:w and tei:sc, as children.

lxml will also simply let you pass a list of tags into iterparse so you can do 
this directly while iterating.

See https://lxml.de/parsing.html#iterparse-and-iterwalk

Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to