Hi all, I'm new to Lucene and have a question about indexing/highlighting of HTML files with Lucene.
What I need to do is highlight the hits (terms) in the original HTML file (or get the positions of the terms/tokens in the original file). This problem has already been described by Fred Toth in this thread in 2005 (Preserving original HTML file offsets for highlighting, need HTMLTokenizer?): http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3c6.2.1.2.2.20050530134630.063ae...@fast.synernet.com%3E I've searched the mailing list archives hoping for an answer, but I had no luck. Does anyone have an idea, if there is a solution for this problem? Also if you know, that it's not possible with Lucene to highlight the hits in the original html-file, it would be helpful to know (I could stop looking for it...). Many thanks in advance! Karo P.S. Actually I wanted to answer the original thred/question from 2005 - is there a way to do this? How can I post an answer to an old thread/mail from the mailing list?