Problem: I have indexed the filepath and the content of thousands of documents and can successfully query the index on the text to return a collection of filepaths. Now I need to create a collection of the tokens in the index which matched the query.
I can see that there are solutions to a related problem, which is how I could highlight the matching terms if I displayed relevant fragments of the document contents. But I don't want to do this; I just want a list of the tokens. The tokens are in the index, the tokens are matched by the query. It seems a lot of extra work to take the selected document, retokenize it, re-execute the query and replace the matching tokens when surely the tokens which match the query are accessible somewhere. (Besides, I can't use Lucene's highlighting to display the document with highlights, because the index is not built from the displayed document but from a pre-processed extract of it, and I don't want to just display fragments of it). I thought the Explanation class might be what I need to use but when I display the content of the explanation for each matching document I see only something like this: score=5.9498425 0.0 = No matching clauses which is no help at all. Is this a wild goose chase or is it achievable somehow? cheers T