I have what looks like a simple problem that turns out to be beyond my admittedly feeble skills.
In a very large corpus of TEI encoded texts I have many instances of this pattern: <hi> <w>something</w <pc>.</pc> <hi> I want to change all of them to <w rend = “hi>something</w> <p>.</pc> without affecting <hi> tags that don’t have those constraints. There are two possible approaches to this. In the first I would strip the <hi> for all cases but only the cases that meet the condition of a <hi> tag that has a <w> child followed by a <pc> child. In the second, I would use a version of the addnext procedure but in reverse order. I cant’ get either of them to work. The strip_tags method suggests in its description strip_tags<https://lxml.de/4.0/api/lxml.etree-module.html#strip_tags>(tree_or_element, *tag_names) description that you might uses it for elements with particular constraints. But it doesn’t appear to recognize or follow the constraings.. For the addnext procedure I would need to find a way of traversing the <hi> list in reverse order. There are different ways of doing that in Python, but I haven’t a way of getting it done in an lxml script. I suspect that there is a very simple explanation, which I am too stupid to see. I’d be grateful for help. Martin Mueller Professor emeritus of English and Classics Northwestern University
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com