Hi,

I have a strange requirement. I am indexing a single HTML Document and
searching it immediately for one or more keywords (Boolean/Phrase query). 
When the keywords are found in the document, I would like to
know if the matched keywords are from hyperlink text, a paragraph or one of
<h1>, <h2> etc tags.  

a) I cannot add multiple fields as I need to do "Phrase" query.

b) During the tokenization, I know exactly if a particular token is from a
specific tag. Can this be stored in
the index as some user-defined flags or something like that and later
retrieve it. Looking at the API, it doesn't seem to be possible.
I see that I can associate token type (such as "word", "eol" ) with the
analyzer token, but this is not stored in the index.

c) One option seems to be to re-tokenize the document after search - like
some of the highlight summary examples are doing.  Then
I can match the document tokens with the terms.


thanks
Ramesh







--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to