Thanks, Cerstin and Michael, for your suggestions. Yes, all the use cases sound perfectly reasonable to me. When it comes to the implementation, I see quite a number of obstacles to make this happen. One of the reasons is that the full-text expression, as currently implemented, discards all elements before tokenizing the texts. This means that the following queries are basically the same:
<a>X <b>Y</b> Z</a>[. contains text 'X Y'] <a>X <b>Y</b> Z</a>[data() contains text 'X Y'] As Cerstin indicated, you'll probably have to parse all text nodes individually; ft:mark(//*[text() contains text {'X', 'Y'}]) This simple approach, however, won't work out with phrases (multiple terms) that reach into descendant nodes. Christian >>> While I concede this may be useful in numerous use cases (and may even >>> seem obvious), it would take quite some time to get implemented, so... >>> please don't expect too much magic for the moment. There will also be >>> some conceptual issues that need to be resolved. As an example, which >>> result would you expect for the following query? >>> >>> ft:mark(<a>X <b>Y</b> Z</a>[. contains text 'X Y']) >> >> I think it should be >> >> <a><mark>X</mark> <b><mark>Y</mark></b> Z</a> >> >> Each token from the search string would be enclosed in a <mark>-element. > > Exactly. While this probably wouldn't cover *all* possible scenarios, > it would still cover most of the useful ones. In fact, it would be > similar to <http://www.raymondhill.net/blog/?p=272>. It would also be > applicable when ignoring elements in a search. > > For complex applications it may help to get the start and end character > positions of the matches (essentially standoff markup), and the > application could then do the highlighting itself on the basis of this > information. > > [...] > >>> If you don't need the inner elements, you may as well remove them from >>> your document before applying ft:mark(). >> >> This is a great idea if you would like to know whether the search >> elements are somewhere in your text. >> >> However, if you would like to show the results to end users (= >> humanities people) or to annotate the document further, it's not a >> good idea to destroy the original structure. Or maybe one would have >> to come up with some tricky workaround to first replace the >> hierarchical node with a flat one for searching, then annotate >> something and somehow replace the original hierarchical one with the >> annotated one preserving the original hierarchy. >> >> And for searching only, the scenario is a TEI-document representing an >> old printed book with highlighting (e.g., some things in italics), >> foreign-language words printed in a different font, person names >> already marked, etc. The TEI rendering is intended to mimic the >> original printed page. When implementing a full-text search, the end >> user expects to see the highlighted search tokens within the rendered >> page. Therefore the "easiest" way is to search in descendant nodes and >> use ft:mark to highlight the hits, without any need to change the TEI >> rendering. This would also allow the end user to not only see the node >> where the search string was found, but scroll up and down to inspect >> the context of the node. > > I fully agree, this is exactly what I need in my application: I don't > want to retrieve snippets from the document, but I always have to > display the full document with the hits highlighted. > > What I'm going to do now is probably highlight the full paragraph which > contains the node retrieved by the search, i.e., get the node ID, walk > up the tree until I encounter a <p> and get its @xml:id, which I can > then use in a CSS stylesheet. Or something like this. But this is > clearly only an approximation. > > Best regards > > -- > Dr.-Ing. Michael Piotrowski, M.A. <m...@cl.uzh.ch> > Institute of Computational Linguistics, University of Zurich > Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044 > * OUT NOW: Systems and Frameworks for Computational Morphology > * <http://www.springeronline.com/978-3-642-23137-7> > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk