On 8 Jun 2023, at 9:10, Jamie Norrish wrote: > I would approach this by first transforming each document into a > simpler structure, using XSLT. If you do not care about anything other > than tei:p, tei:w, and tei:sc elements, and for all of the latter two > to be children of the former, then your transform can go find all tei:p > (and any other containing elements you might have) and output them, and > then all descendant tei:w and tei:sc, as children.
lxml will also simply let you pass a list of tags into iterparse so you can do this directly while iterating. See https://lxml.de/parsing.html#iterparse-and-iterwalk Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Sengelsweg 34 Düsseldorf D- 40489 Tel: +49-203-3925-0390 Mobile: +49-178-782-6226 _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com