On Fri, Oct 10, 2014 at 6:39 AM, Michael McCandless < [email protected]> wrote:
> +1 for a "completely accurate" (each snippet shown matches the query) > and fast highlighter, but it's a real challenge because you need a > clean way to recursively iterate all positions for any (even > non-positional) queries (what LUCENE-2878 will give us). To properly > handle your (+A +B) (+C +D) example, you'd need BooleanQuery to > participate in enumerating the positions... > My plan for that is to convert TermQueries to something similar that gets a docsAndPositionsEnum (with offsets) instead of a plain DocsEnum. The code that navigates the graph can cast it to get what it needs. Alternatively, I thought perhaps I might wrap the IndexReader on down with pass-throughs but ensure that you always get positions (with offsets) even when you don’t ask for it, and then I’ll keep track of each instance for retrieval later. Though somehow I’d need to map the Query to the tracked positions enumerators, and this sounds like more work so I probably won’t go this route. I plan to convert the Query tree to an equivalent (for highlighter purposes) comprised of BooleanQuery, TermQuery (some custom similar one, actually), MultiTermQueries (again, some custom variant), and SpanQueries — phrase queries get converted to those. ~ David
