[lxml] specifying conditions for removing the tags of a particular elemenet

Martin Mueller Fri, 06 Mar 2026 15:28:36 -0800

I'm having trouble with the etree.strip_tags function.   I have a large TEI 
corpus with <w> elements for every word token. There are a lot of short <hi> 
element, and I would like to get rid of them and move the <hi> markers into 
@rend attributes, which simplifies processing at the level beyond <w> elements. 
I would like to



  1.
mark all the <w> children of <hi> elements with less than four children with 
appropriate @rend attributes
  2.
use the strip_tags function to get rid of <hi> elements with less than four 
children

I know how to get rid of all <hi> elements with something like


def process_tree(tree,filename, item):
  for div in tree.iter(tei +'div'):
    for el in div.iter('*'):
      etree.strip_tags(el, tei + 'hi')
  newname = os.path.join(allerlei, item)
  tree.write(newname, encoding='utf-8')


But something like

for el in div.iter('*'):
  if len(el) < 4:
     etree.strip_tags(el, tei + 'hi')

does not work. Is it not possible to specify constraints on the strip_tags 
function or do I not know what I am doing?

The latter very likely, and I will be grateful for any advice

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: [email protected]

[lxml] specifying conditions for removing the tags of a particular elemenet

Reply via email to