[ https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055546#comment-13055546 ]
Mike Sokolov commented on LUCENE-1889: -------------------------------------- Robert: Thanks that sounds like good advice. I wasn't completely happy with that Pattern list anyway; really still just feeling my way around Lucene and trying random things at this point a bit. I wonder if you could comment on this possible other idea, following up on Mike M's quote above: I tried hacking up SpanScorer to see if I could get positions out of it using a custom Collector, but found that by the time a doc was reported, SpanScorer had already iterated over and dropped the positions. I was thinking of adding a Collector.collectSpans(int start, int end), and having SpanScorer call it (it would be an empty function in Collector proper) or something like that. At this point I'm wondering if it might be possible to rewrite many queries as some kind of SpanQuery (using a visitor), without the need to actually alter all the Query implementations. Is there a better way? I was also thinking it might be possible to capture and re-use positions gathered during the initial scoring episode rather than having to re-score during highlighting, but I guess that's a separate issue. Koji: Thanks for the review, but it sounds like some more iteration is needed here; for sure on RegExpQuery. I probably should have tested that a bit more carefully, although the one thing I tried (character classes) seems to work the same. > FastVectorHighlighter: support for additional queries > ----------------------------------------------------- > > Key: LUCENE-1889 > URL: https://issues.apache.org/jira/browse/LUCENE-1889 > Project: Lucene - Java > Issue Type: Wish > Components: modules/highlighter > Reporter: Robert Muir > Priority: Minor > Attachments: LUCENE-1889.patch > > > I am using fastvectorhighlighter for some strange languages and it is working > well! > One thing i noticed immediately is that many query types are not highlighted > (multitermquery, multiphrasequery, etc) > Here is one thing Michael M posted in the original ticket: > {quote} > I think a nice [eventual] model would be if we could simply re-run the > scorer on the single document (using InstantiatedIndex maybe, or > simply some sort of wrapper on the term vectors which are already a > mini-inverted-index for a single doc), but extend the scorer API to > tell us the exact term occurrences that participated in a match (which > I don't think is exposed today). > {quote} > Due to strange requirements I am using something similar to this (but > specialized to our case). > I am doing strange things like forcing multitermqueries to rewrite into > boolean queries so they will be highlighted, > and flattening multiphrasequeries into boolean or'ed phrasequeries. > I do not think these things would be 'fast', but i had a few ideas that might > help: > * looking at contrib/highlighter, you can support FilteredQuery in flatten() > by calling getQuery() right? > * maybe as a last resort, try Query.extractTerms() ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org