I need to take an html page that I retrieve from my lucene search and highlight all of the terms that are part of the search. I need to skip over any html tags since I don't want any words in tags which happen to match the search to be highlighted.
Note that I don't want sections of the document. I need to highlight all terms in the document (with a <span> or something similar) and get back the entire document (with the new <span>s) so it can be displayed in its entirety with the search terms highlighted. Last time I did this (in the days of 1.4.2 - so a while ago), I had to write a custom tokenizer that skipped over the html tokens so that I didn't accidentally highlight them. I'm hoping that there is an easier way to do this now. Suggestions?