Hi Martin,
I’m not sure I follow exactly what you’re trying to do, but I’ve noodled with
TEI before and I think I know the direction.
Are you familiar with XPath? I think
for w in tree.xpath("//hi[count(w) < 4]/w"):
w.attrib["rend"] = "hi" # Add the @rend attribute to the <w> children.
hi = w.getparent() # Get the <hi> parent.
hi.tag = "hi-remove-me" # Change the <hi> element’s tag so we can find
it.
etree.strip_tags(tree, "hi-remove-me") # Not sure about the ns here.
should get you started… 🤔
Jens
> On Mar 7, 2026, at 09:21, Martin Mueller <[email protected]>
> wrote:
>
> I'm having trouble with the etree.strip_tags function. I have a large TEI
> corpus with <w> elements for every word token. There are a lot of short <hi>
> element, and I would like to get rid of them and move the <hi> markers into
> @rend attributes, which simplifies processing at the level beyond <w>
> elements. I would like to
>
> • mark all the <w> children of <hi> elements with less than four children
> with appropriate @rend attributes
> • use the strip_tags function to get rid of <hi> elements with less than
> four children
>
> I know how to get rid of all <hi> elements with something like
>
> def process_tree(tree,filename, item):
> for div in tree.iter(tei +'div'):
> for el in div.iter('*'):
> etree.strip_tags(el, tei + 'hi')
> newname = os.path.join(allerlei, item)
> tree.write(newname, encoding='utf-8')
>
> But something like
>
> for el in div.iter('*'):
> if len(el) < 4:
> etree.strip_tags(el, tei + 'hi')
>
> does not work. Is it not possible to specify constraints on the strip_tags
> function or do I not know what I am doing?
>
> The latter very likely, and I will be grateful for any advice
>
> Martin Mueller
> Professor emeritus of English and Classics
> Northwestern University
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: [email protected]