[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

Robert Muir (JIRA) Mon, 27 Jun 2011 03:23:26 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055452#comment-13055452
 ]


Robert Muir commented on LUCENE-1889:
-------------------------------------

{quote}
A possible issue is that regex support will differ from RegexpQuery, but I 
think? that Java's is a superset, so should be ok, but I'm not sure about this 
one.
{quote}

Actually, these are totally different syntaxes!

An alternative way to flatten these multitermqueries could be to implement 
o.a.l.index.Terms with what is in the term vector... then you could rewrite 
them with their own code.

trying to generate an equivalent string pattern could be a little problematic, 
for example wildcard supports escaped terms (and could contain other characters 
that are java.util.regex syntax characters but not wildcard syntax characters), 
the regex syntax is different, etc.

if you still decide you want to do it this way though, i would use 
o.a.l.util.automaton instead of java.util.regex? Besides being faster, this is 
internally what these queries are using anyway, so you can convert them with 
for example WildcardQuery.toAutomaton(). Then, union these and match against 
the union'ed machine instead of a List.

But personally i would look at going the Terms/rewriteMethod route if possible, 
this way all multitermqueries will "just work".


> FastVectorHighlighter: support for additional queries
> -----------------------------------------------------
>
>                 Key: LUCENE-1889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1889
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: modules/highlighter
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: LUCENE-1889.patch
>
>
> I am using fastvectorhighlighter for some strange languages and it is working 
> well! 
> One thing i noticed immediately is that many query types are not highlighted 
> (multitermquery, multiphrasequery, etc)
> Here is one thing Michael M posted in the original ticket:
> {quote}
> I think a nice [eventual] model would be if we could simply re-run the
> scorer on the single document (using InstantiatedIndex maybe, or
> simply some sort of wrapper on the term vectors which are already a
> mini-inverted-index for a single doc), but extend the scorer API to
> tell us the exact term occurrences that participated in a match (which
> I don't think is exposed today).
> {quote}
> Due to strange requirements I am using something similar to this (but 
> specialized to our case).
> I am doing strange things like forcing multitermqueries to rewrite into 
> boolean queries so they will be highlighted,
> and flattening multiphrasequeries into boolean or'ed phrasequeries.
> I do not think these things would be 'fast', but i had a few ideas that might 
> help:
> * looking at contrib/highlighter, you can support FilteredQuery in flatten() 
> by calling getQuery() right?
> * maybe as a last resort, try Query.extractTerms() ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1889) FastVectorHighlighter: support for additional queries

Reply via email to