Problem: I have indexed the filepath and the content of thousands of
documents and can successfully query the index on the text to return a
collection of filepaths. Now I need to create a collection of the tokens in
the index which matched the query.

 

I can see that there are solutions to a related problem, which is how I
could highlight the matching terms if I displayed relevant fragments of the
document contents. But I don't want to do this; I just want a list of the
tokens. The tokens are in the index, the tokens are matched by the query. It
seems a lot of extra work to take the selected document, retokenize it,
re-execute the query and replace the matching tokens when surely the tokens
which match the query are accessible somewhere. (Besides, I can't use
Lucene's highlighting to display the document with highlights, because the
index is not built from the displayed document but from a pre-processed
extract of it, and I don't want to just display fragments of it).

 

I thought the Explanation class might be what I need to use but when I
display the content of the explanation for each matching document I see only
something like this:

 

  score=5.9498425

  0.0 = No matching clauses

 

which is no help at all.

 

Is this a wild goose chase or is it achievable somehow?

 

cheers

T

 

 

Reply via email to