[jira] Commented: (LUCENE-1999) Match spotter for all query types

Mark Harwood (JIRA) Wed, 21 Oct 2009 07:32:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768257#action_12768257
 ]


Mark Harwood commented on LUCENE-1999:
--------------------------------------

bq. and 2) you need it for every single doc visited by the query

Actually I don't need it for every doc, only the top ones  - it just happens to 
be so cheap to produce that I can afford to run this in-line with the query. (I 
haven't actually benchmarked it at scale buy my gut feel is it would be fast )

I was thinking that this might be orthogonal to the existing "free-text" based 
highlighter. The logic for this being roughly that

1) Highlighting of free-text fields is reasonably well-catered for with 
summarisation etc.
2) The remaining problem areas for highlighting (NumericRangeQuery, Spatial, 
Cached term filters on enums eg gender:male/female) are all likely to be 
non-free-text fields which don't require summarisation and only contain a 
single value.

I may be wrong in these assumptions about the existing state of play (any 
thoughts, Mark M?) but it might be useful to think of attacking the problem 
with these 2 different requirements in mind.

Regardless of type e.g. int, long etc I tend to think of fields as falling into 
these broad usage categories:

a) "Identifiers" (e.g. primary keys)
b) Quantifiers (e.g numerics, dates, spatial)
c) Free-text 
d) Controlled vocabularies (e.g. enums such as gender:m/f)

Type a ) is catered for with a straight TermQuery and therefore can be handled 
with the existing highlighter
Type b) needs special indexes/queries (spatial/trie) and isn't catered for by 
the existing term/span-based Highlighter
Type c) is catered for with the existing highlighter and its summarising 
features
Type d) involves many TermDoc.next reads so is usefully cached as filters and 
therefore not catered for by existing Highlighter

So this patch helps cater for types b) and d) where simply knowing the field 
matched is all that is required to highlight.


> Match spotter for all query types
> ---------------------------------
>
>                 Key: LUCENE-1999
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1999
>             Project: Lucene - Java
>          Issue Type: New Feature
>    Affects Versions: 2.9
>            Reporter: Mark Harwood
>         Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight 
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match 
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore 
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: 
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to 
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector 
> APIs.
> This may be something we should consider.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1999) Match spotter for all query types

Reply via email to