[ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055546#comment-13055546
 ] 

Mike Sokolov commented on LUCENE-1889:
--------------------------------------

Robert: Thanks that sounds like good advice. I wasn't completely happy with 
that Pattern list anyway; really still just feeling my way around Lucene and 
trying random things at this point a bit.  I wonder if you could comment on 
this possible other idea, following up on Mike M's quote above:

I tried hacking up SpanScorer to see if I could get positions out of it using a 
custom Collector, but found that by the time a doc was reported, SpanScorer had 
already iterated over and dropped the positions.  I was thinking of adding a 
Collector.collectSpans(int start, int end), and having SpanScorer call it (it 
would be an empty function in Collector proper) or something like that.  At 
this point I'm wondering if it might be possible to rewrite many queries as 
some kind of SpanQuery (using a visitor), without the need to actually alter 
all the Query implementations.  Is there a better way?

I was also thinking it might be possible to capture and re-use positions 
gathered during the initial scoring episode rather than having to re-score 
during highlighting, but I guess that's a separate issue.

Koji: Thanks for the review, but it sounds like some more iteration is needed 
here; for sure on RegExpQuery.  I probably should have tested that a bit more 
carefully, although the one thing I tried (character classes) seems to work the 
same.

> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: modules/highlighter
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-1889.patch
>
>
> I am using fastvectorhighlighter for some strange languages and it is working 
> well! 
> One thing i noticed immediately is that many query types are not highlighted 
> (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but 
> specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into 
> boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might 
> help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
> by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to