I'm having trouble with the etree.strip_tags function. I have a large TEI
corpus with <w> elements for every word token. There are a lot of short <hi>
element, and I would like to get rid of them and move the <hi> markers into
@rend attributes, which simplifies processing at the level beyond <w> elements.
I would like to
1.
mark all the <w> children of <hi> elements with less than four children with
appropriate @rend attributes
2.
use the strip_tags function to get rid of <hi> elements with less than four
children
I know how to get rid of all <hi> elements with something like
def process_tree(tree,filename, item):
for div in tree.iter(tei +'div'):
for el in div.iter('*'):
etree.strip_tags(el, tei + 'hi')
newname = os.path.join(allerlei, item)
tree.write(newname, encoding='utf-8')
But something like
for el in div.iter('*'):
if len(el) < 4:
etree.strip_tags(el, tei + 'hi')
does not work. Is it not possible to specify constraints on the strip_tags
function or do I not know what I am doing?
The latter very likely, and I will be grateful for any advice
Martin Mueller
Professor emeritus of English and Classics
Northwestern University
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: [email protected]