It can also come from the score which is different. If you set up a threshold to return the results, it can be the problem.
Franck
Niraj Alok wrote:
Hi Franck,
Thank you so much for the detailed explanation. However, when I tried to break up my MultiFieldQueryParser into a series of BooleanQueries, the result set has got reduced drastically. Any idea why this could be happening?
Regards, Niraj ----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 2:54 PM Subject: Re: score and frequency
The MultiFieldQueryParser give you a BooleanQuery containing 1 query for each field. Something like: BooleanQuery / | | \ QF1 QF2 QF3 QF4 (QFx=Query for field x)
You can still use the MultiFieldQueryParser and create a BooleanQuery to encapsulate the one parsed + the PhraseQuery, ie: BooleanQuery(created by you) / \ BQ PhraseQuery
Or create the whole query (I think you should do that) and have something like that: _BooleanQuery__ / | | \ \ QF1 QF2 QF3 QF4 PhraseQuery (QFx=Query for field x)
It's like parsing the following query: (field1:query) (field2:query) (field3:query)...(fieldx:query) (title:"query")~boost
Franck
Niraj Alok wrote:
I asked the previous question since I do not know how to use PhraseQuery
I have one booleanquery and one query. The query is Query query = MultiFieldQueryParser.parse( qs, searchLoc, flags, new StandardAnalyzer(stop));
where qs is the word to be searched upon and searchLoc contains all the
four
fields.
How do I insert a PhraseQuery here for title field only, and that too
with
its boosted value?
Regards, Niraj ----- Original Message ----- From: "Niraj Alok" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 2:00 PM Subject: Re: score and frequency
Does it mean that I would need to abandon MultiFieldQueryParser?
Regards, Niraj ----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 1:22 PM Subject: Re: score and frequency
Hi, first, what do you consider as an 'exact matching' ? It seems that you treat the search word by word, so 'lion sea' will be an 'exact match'
of
'sea-lion'. I think you should add a PhraseQuery to your query containing the title and with a big boost. So, you don't need to boost your title field.
Only
the results matching exactly (for the PhraseQuery) will be boosted.
Franck
Niraj Alok wrote:
Hi Guys,
I seem to have run into rough weather again. To describe the problem as concisely as possible, I have four fields
to
search upon : title , first para, rest of the paras and content (equal
to
title + first para + rest of the para) . I am doing this by using MultiFieldQueryParser.
Now there is a very complicated ranking algrorithm specified by the
client and I have met most of them except one or two and really need
your
help as all my other efforts have failed.
The most important rule is that exact matching titles should come
first
, i.e. get higher scores.
I have given the highest boost factor to the title than the rest but
the
problem comes up when there is some other title which has got just one
word
matching. For e.g., if I search for lion, there is a title sea-lion
which
also has the same boost factor as that of "lion" in the index. Also, sea-lion has got some more "lion" in its first para or rest of the paras etc. such that its score comes higher than "lion".
Is there some way to get the exact matching titles higher scores? Please reply soon.
Regards, Niraj
----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, June 07, 2004 12:50 PM Subject: Re: score and frequency
It seems that you don't the length norm to be used. It's a factor
which
normalize the score of a doc depending on the size of the searched
field
of the doc. It's the field which make that 'ground ice' has a higher score than 'ice hockey: British Sekonda Superleague Play-Off Championship: finals' because it only has 2 terms. So, I suggest you to override the lengthNorm method and to ignore the numTokens parameter. NB: The length norm is computed during the indexation and the norm
are
store in the index (in the _aaa.f# files). So, you need to do
re-index
your data, and use this similarity during the indexation.
Cheers, Franck
Niraj Alok wrote:
I have set the searcher.setSimilarity as well as also tried setting
the
coord factor to 1.
The problem as given by an example is : Lets say I have titles to be displayed depending upon the search. E.g if i have "ice hockey" as the search item and if it is default similarity, my results are :
ice hockey0.99999994 ice hockey0.75 ice hockey0.75 winter Olympics: hockey, ice, medallists0.17402513 ice age0.073680125 National Hockey League0.020266924 Cracking the Ice Age0.018420031 ground-ice0.011512519 ice hockey: British Sekonda Superleague Play-Off Championship: finals0.0069075115 (the numbers indicating the score).
But if i set the similarity as my overridden one, the results
become:
ice hockey0.99999994 ice hockey0.75 ice hockey0.75 ice age0.22104037 winter Olympics: hockey, ice, medallists0.17402513 National Hockey League0.060800765 Cracking the Ice Age0.055260092 ground-ice0.034537554 ice hockey: British Sekonda Superleague Play-Off Championship: finals0.020722535
I want all the titles which have both "ice" and "hockey" to come
above
the
rest (to have higher scores) Meaning i would wish the results to appear like:
ice hockey ice hockey ice hockey winter Olympics: hockey, ice, medallists ice hockey: British Sekonda Superleague Play-Off Championship:
finals
ice age National Hockey League Cracking the Ice Age ground-ice
My overriden similarity class contains just this method: public float coord(int overlap, int maxOverlap) {
return 1.0f;
}
I feel it is the weight factor which is producing indesirable
results.
Any
help in this regard would be highly appreciated.
Regards, Niraj
----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, June 04, 2004 8:46 PM Subject: Re: score and frequency
Hi,
Be careful to set the default similarity 'Similarity.setDefault(similarity)' before creating your search
instance
(IndexSearcher). If you change the default similarity after, you'll still use the
old
one.
You'd better use the 'searcher.setSimilarity' method on your
searcher.
Franck
Phil brunet wrote:
Hi to all.
Maybe the term frequency is not the only parameter you need to
override
to "customize" the score attributed by Lucene.
Maybe you should consider the normalisation factor, the idf and
the
coord factor ?
Philippe
From: "Niraj Alok" <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Subject: Re: score and frequency Date: Fri, 4 Jun 2004 15:13:32 +0530
Hi Erik,
Thanks for the suggestion.
I tried this: public class RelevanceSimilarity extends DefaultSimilarity
{
public float tf(float freq) {
System.out.println("discounting frequency");
return (float)1;
}
}
and in my query class, I used :
Similarity.setDefault(similarity);
Hits hits = is.search(query);
for(i = 0; i < hits.length(); i ++)
result = result + hits.score(i);
However, this is still not giving me the expected result. Do I
need
to
do
something else?
Regards, Niraj
----- Original Message ----- From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, June 04, 2004 1:55 PM Subject: Re: score and frequency
On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:
Hi,
I am having some problems with the score of lucene. I am trying to get the results displayed according to
hits.score
and
it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score.
Is it possible to get the score which does not have the
frequency
factor in it ?
Have a look at the javadocs for Similarity. DefaultSimilarity
is
used
unless otherwise specified. You could subclass that and
override
this:
public float tf(float freq) { return (float)Math.sqrt(freq); }
and return 1.0. This might give you the effect you want.
Erik
--------------------------------------------------------------------
-
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
_________________________________________________________________ Bloquez les fen�tres pop-up, c'est gratuit ! http://toolbar.msn.fr
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:
[EMAIL PROTECTED]
-- Franck Brisbart R&D http://www.kelkoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Franck Brisbart R&D http://www.kelkoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Franck Brisbart R&D http://www.kelkoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Franck Brisbart R&D http://www.kelkoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Franck Brisbart R&D http://www.kelkoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
