[lxml] Re: specifying conditions for removing the tags of a particular elemenet

Jens Tröger via lxml - The Python XML Toolkit Fri, 06 Mar 2026 16:50:54 -0800

Hi Martin,

I’m not sure I follow exactly what you’re trying to do, but I’ve noodled with 
TEI before and I think I know the direction.


Are you familiar with XPath? I think 

    for w in tree.xpath("//hi[count(w) < 4]/w"):
        w.attrib["rend"] = "hi"  # Add the @rend attribute to the <w> children.
        hi = w.getparent()  # Get the <hi> parent.
        hi.tag = "hi-remove-me"  # Change the <hi> element’s tag so we can find 
it.
    etree.strip_tags(tree, "hi-remove-me")  # Not sure about the ns here.

should get you started… 🤔
Jens



> On Mar 7, 2026, at 09:21, Martin Mueller <[email protected]> 
> wrote:
> 
> I'm having trouble with the etree.strip_tags function.   I have a large TEI 
> corpus with <w> elements for every word token. There are a lot of short <hi> 
> element, and I would like to get rid of them and move the <hi> markers into 
> @rend attributes, which simplifies processing at the level beyond <w> 
> elements. I would like to 
> 
>     • mark all the <w> children of <hi> elements with less than four children 
> with appropriate @rend attributes
>     • use the strip_tags function to get rid of <hi> elements with less than 
> four children
> 
> I know how to get rid of all <hi> elements with something like
> 
> def process_tree(tree,filename, item):
>  for div in tree.iter(tei +'div'):
>    for el in div.iter('*'):
>      etree.strip_tags(el, tei + 'hi')
>  newname = os.path.join(allerlei, item)
>  tree.write(newname, encoding='utf-8')
> 
> But something like
> 
> for el in div.iter('*'):
>   if len(el) < 4:
>      etree.strip_tags(el, tei + 'hi')
> 
> does not work. Is it not possible to specify constraints on the strip_tags 
> function or do I not know what I am doing?
> 
> The latter very likely, and I will be grateful for any advice
> 
> Martin Mueller
> Professor emeritus of English and Classics
> Northwestern University

_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: [email protected]

[lxml] Re: specifying conditions for removing the tags of a particular elemenet

Reply via email to