Re: best way to interest two queries?

mark harwood Wed, 12 May 2010 01:55:45 -0700


>>two terminology questions:


>>- is multiplier in the mail mentioned there the same as boost?

This factor controls how many decimal places precision is retained in the 
adjusted scores. Pick to low a multiplier and scores that are only 
differentiated by a very small value will appear equal. Pick too high a 
multiplier and you start to lose the most significant parts of the score. This 
trade-off is summarised here for various settings of "multiplier":

multiplier       max score   fraction precision
======   ========   =============
10           838860         0.x
100         83886              0.xx
1000       8388             0.xxx
10000     838               0.xxxx

The default setting of 1000 seems like a safe setting for the typical scores 
generated by Lucene.

- I intended to use prefix and fuzzyqueries. I believe this is contradictory to 
this or?

You can wrap any queries with this class - the only limitation is it hides all 
match info in a single byte encoded into the score which only allows for 8 bits 
or 8 match flags i.e. reports on max 8 clauses. You could try use > 8 bits 
encoded into the score but then you lose more score precision again (see above).

Some thoughts on a less bit-twiddly, more robust approach:
Having played with the new Attribute stuff in 2.9/3.0 Analyzers recently I am 
intrigued with using a similar approach to capture low-level match metadata  
i.e. clients decide what types of MatchAttributes are of interest and Query 
objects record match metadata in singleton MatchAttribute objects as they 
stream their way through result sets.
Result set streaming and tokenisation streams are similar problems and the 
Attribute design seems like it can apply here.

Cheers
Mark

Le 11-mai-10 à 12:02, mark harwood a écrit :

> See https://issues.apache.org/jira/browse/LUCENE-1999
> 
> 
> 
> ----- Original Message ----
> From: Paul Libbrecht <[email protected]>
> To: [email protected]
> Sent: Tue, 11 May, 2010 10:52:14
> Subject: Re: best way to interest two queries?
> 
> Dear lucene experts,
> 
> Let me try to make this precise since there was not answer.
> 
> I have a query that's, about,
>  a & b & c
> and I have a good search result.
> Now I want to know:
> 
> a) for the first page, which matches are matches for a, b, or c
> b) for the remaining results (for the "tail"), are there matches of a, b, or c
> 
> Thus far, I'd only know the usage of the highlighter to go to fields, it's 
> not exactly the same and it's slow.
> I know I could use termDocs or another search-result for a,b, and c, probably 
> to annotate my initial results list; that could work well for a).
> 
> I still don't know what to do for b).
> 
> thanks for hints.
> 
> paul
> 
> Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
>> I've been wandering around but I see no solution yet: I would like to 
>> intersect two query results: going through the list of one query and 
>> indicating which ones actually match the other query or, even better, 
>> indicating that "passed this, nothing matches that query anymore".
>> 
>> What should be the strategy?
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: best way to interest two queries?

Reply via email to