Hi, I have a strange requirement. I am indexing a single HTML Document and searching it immediately for one or more keywords (Boolean/Phrase query). When the keywords are found in the document, I would like to know if the matched keywords are from hyperlink text, a paragraph or one of <h1>, <h2> etc tags.
a) I cannot add multiple fields as I need to do "Phrase" query. b) During the tokenization, I know exactly if a particular token is from a specific tag. Can this be stored in the index as some user-defined flags or something like that and later retrieve it. Looking at the API, it doesn't seem to be possible. I see that I can associate token type (such as "word", "eol" ) with the analyzer token, but this is not stored in the index. c) One option seems to be to re-tokenize the document after search - like some of the highlight summary examples are doing. Then I can match the document tokens with the terms. thanks Ramesh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
