Martin Mueller wrote at 2023-6-8 04:02 +0000: >I use lxml to work with a large collection of TEI-encoded texts(66,000) that >are linguistically annotated. Each token is wrapped in a <w> or <pc> element >with a unique ID and various attributes. I can march through the texts at the >lowest level of <w> and <pc> elements without paying any attention to the >discursive structure of higher elements. I just do > > for w in tree.iter(tei + �w�, tei + �pc�: > if x: > do this > if y: > do that > >But now I want to create a concordance in which tokens meeting some condition >are pulled out and surrounded with seven words on either side. I do this with >itersiblings(), but that is a tricky operation. The next <w> token may not be >a sibling but a child of a higher level sibling. Remembering that �elements >are lists� you have patterns like > > [a, b, c, [d, e, f] g, h, i, [k, l, m, n]
Apparently, the sequence of `w` and `pc` elements (in document order) is essential. You already have a solution to determine this sequence. If you have any element, you can determine its `parent` and therefore (recursively) the path to the element. If you have elements `e1` and `e2`, you can then determine the deepest common ancestor. Maybe, that helps you to solve your problem. _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com