[lxml] A strip_tags problem

Martin Mueller Sat, 26 Apr 2025 18:35:56 -0700

I have what looks like a simple problem that turns out to be beyond my 
admittedly feeble skills.


In a very large corpus of TEI encoded texts I have many instances of this 
pattern:

<hi>
<w>something</w
<pc>.</pc>
<hi>

I want to  change   all of them to

<w rend = “hi>something</w>
<p>.</pc>

without affecting <hi> tags that don’t have those constraints.

There are two possible approaches to this. In the first I would strip the <hi> 
for all cases but only the cases that meet the condition of a <hi> tag that has 
a <w> child followed by a <pc> child.  In the second, I would use a version of 
the addnext  procedure but in reverse order.

I cant’ get either of them to work. The strip_tags method suggests in its 
description

strip_tags<https://lxml.de/4.0/api/lxml.etree-module.html#strip_tags>(tree_or_element,
 *tag_names)

description that you might uses it for elements with particular constraints.  
But it doesn’t appear to recognize or follow the constraings..

For the addnext  procedure I would need to find a way of traversing  the <hi> 
list in reverse order. There are different ways of doing that in Python, but I 
haven’t a way of getting it done in an lxml script.

I suspect that there is a very simple explanation, which I am too stupid to see.

I’d be grateful for help.


Martin Mueller
Professor emeritus of English and Classics
Northwestern University

_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

[lxml] A strip_tags problem

Reply via email to