I don't think your field title can be used to search for the exact match, because in your field, the title is probably tokenized.
The other field for the title should contain the title as a keyword, ie not tokenized. So, if the title is 'sea-lion', the term ("title","sea-lion") is store.


The transformation I told you is to avoid storing the raw term, ie with the case and special characters.
Something like that:
--
StandardAnalyzer analyzer = new StandardAnalyzer();
Token token;
StringBuffer buf = new StringBuffer();
try {
stream = analyzer.tokenStream("title", new StringReader(title));
while ((token = stream.next()) != null) {
qBuf.append(' ');
qBuf.append(token.termText());
}
} catch(IOException ioe) {
ioe.printStackTrace();
}
String transformedTitle = buf.toString().trim();
--
So, if you search for "sea-lion", "sea lion" or "Sea lion", the transformed text will be "sea lion", the TermQuery will be new TermQuery(new Term("newField", "sea lion")) and you could then search for an exact match.


Franck

Niraj Alok wrote:
Hi Franck,

I already had a field separately for the title. Replacing the PhraseQuery
with the TermQuery did not help.
I haven't tried the transformation part. What kind of transformation are you
talking about ? How to do this transformation?
Can you provide some more details please?


Regards, Niraj ----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 7:04 PM Subject: Re: score and frequency



Forget about the PhraseQuery, I'm stupid, it can't work like that.
Because the phrase query will boost the documents which contain the
search and not the documents which match exactly the search. So, the
exact matches will come down. :-/

You need to have some information about the lucene documents to know if
it's an exact match. Such as the number of terms in the documents. The
problem is that this number is store in the lengthNorm and as it's
encoded on 1 byte, you can't have it precisely. So, you should shunt the
problem.
Here's another suggestion (a good one I hope):
Add another field containing the title as a Keyword. Then you just have
to replace the PhraseQuery I told you to use by a TermQuery searching
for the term (newField,search)
Of course, it will be a bit too restrictive to store the title without
any transformation. You can for example store in this field the
concatenation of the token given by your analyzer. Just don't forget to
do the same transformation also for the search.
Sorry for the previous posts.

Franck

Niraj Alok wrote:

Hi Franck,

You seem to be a genius in lucene !

I have done finally all that which you have suggested, but now when I am
searching for "lion", those terms are coming much below in terms of

scores.

This is despite me setting the boost for the phrase query. Infact, this

is

resulting in almost all the exact matches to come down.


Regards, Niraj ----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 6:02 PM Subject: Re: score and frequency




It may come from the boolean clauses. If you add your sub-queries with a
'required' flag, you'll only get the results matching all the words in
your query.
It can also come from the score which is different. If you set up a
threshold to return the results, it can be the problem.

Franck


Niraj Alok wrote:


Hi Franck,

Thank you so much for the detailed explanation.
However, when I tried to break up my MultiFieldQueryParser into a

series

of


BooleanQueries, the result set has got reduced drastically.
Any idea why this could be happening?

Regards,
Niraj
----- Original Message -----
From: "Brisbart Franck" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, June 24, 2004 2:54 PM
Subject: Re: score and frequency





The MultiFieldQueryParser give you a BooleanQuery containing 1 query

for

each field.
Something like:
         BooleanQuery
         /   |   |   \
       QF1  QF2 QF3  QF4    (QFx=Query for field x)

You can still use the MultiFieldQueryParser and create a BooleanQuery

to

encapsulate the one parsed + the PhraseQuery, ie:
          BooleanQuery(created by you)
           /       \
         BQ      PhraseQuery

Or create the whole query (I think you should do that) and have
something like that:
          _BooleanQuery__
         /   |   |   \   \
       QF1  QF2 QF3  QF4  PhraseQuery      (QFx=Query for field x)

It's like parsing the following query:
(field1:query) (field2:query) (field3:query)...(fieldx:query)
(title:"query")~boost


Franck


Niraj Alok wrote:



I asked the previous question since I do not know how to use

PhraseQuery


I have one booleanquery and one query.
The query is Query query =  MultiFieldQueryParser.parse( qs,

searchLoc,

flags, new StandardAnalyzer(stop));

where qs is the word to be searched upon and searchLoc contains all

the

four



fields.

How do I insert a PhraseQuery here for title field only, and that too

with



its boosted value?


Regards, Niraj ----- Original Message ----- From: "Niraj Alok" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, June 24, 2004 2:00 PM Subject: Re: score and frequency






Does it mean that I would need to abandon MultiFieldQueryParser?

Regards,
Niraj
----- Original Message -----
From: "Brisbart Franck" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, June 24, 2004 1:22 PM
Subject: Re: score and frequency






Hi,
first, what do you consider as an 'exact matching' ? It seems that

you


treat the search word by word, so 'lion sea' will be an 'exact

match'

of



'sea-lion'.
I think you should add a PhraseQuery to your query containing the

title


and with a big boost. So, you don't need to boost your title field.

Only



the results matching exactly (for the PhraseQuery) will be boosted.

Franck


Niraj Alok wrote:




Hi Guys,

I seem to have run into rough weather again.
To describe the problem as concisely as possible, I have four

fields

to




search upon : title , first para, rest of the paras and content

(equal

to



title + first para + rest of the para) .  I am doing this by using
MultiFieldQueryParser.




Now there is a very complicated ranking algrorithm specified by

the

client and I have met most of them except one or two and really need

your



help as all my other efforts have failed.




The most important rule is that exact matching titles should come

first




, i.e. get higher scores.




I have given the highest boost factor to the title than the rest

but

the




problem comes up when there is some other title which has got just

one

word




matching. For e.g., if I search for lion, there is a title sea-lion

which



also has the same boost factor as that of "lion" in the index. Also,
sea-lion has got some more "lion" in its first para or rest of the

paras


etc. such that its score comes higher than "lion".




Is there some way to get the exact matching titles higher scores?
Please reply soon.


Regards, Niraj


----- Original Message ----- From: "Brisbart Franck" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, June 07, 2004 12:50 PM Subject: Re: score and frequency







It seems that you don't the length norm to be used. It's a factor

which




normalize the score of a doc depending on the size of the

searched

field




of the doc. It's the field which make that 'ground ice' has a

higher


score than 'ice hockey: British Sekonda Superleague Play-Off
Championship: finals' because it only has 2 terms.
So, I suggest you to override the lengthNorm method and to ignore

the


numTokens parameter.
NB: The length norm is computed during the indexation and the

norm

are



store in the index (in the _aaa.f# files). So, you need to do

re-index



your data, and use this similarity during the indexation.

Cheers,
Franck


Niraj Alok wrote:





I have set the searcher.setSimilarity as well as also tried

setting


the




coord factor to 1.

The problem as given by an example is : Lets say I have titles

to

be


displayed depending upon the search.
E.g if i have "ice hockey" as the search item and if it is

default

similarity, my results are :

ice hockey0.99999994
ice hockey0.75
ice hockey0.75
winter Olympics: hockey, ice, medallists0.17402513
ice age0.073680125
National Hockey League0.020266924
Cracking the Ice Age0.018420031
ground-ice0.011512519
ice hockey: British Sekonda Superleague Play-Off Championship:
finals0.0069075115
(the numbers indicating the score).


But if i set the similarity as my overridden one, the results

become:



ice hockey0.99999994
ice hockey0.75
ice hockey0.75
ice age0.22104037
winter Olympics: hockey, ice, medallists0.17402513
National Hockey League0.060800765
Cracking the Ice Age0.055260092
ground-ice0.034537554
ice hockey: British Sekonda Superleague Play-Off Championship:
finals0.020722535


I want all the titles which have both "ice" and "hockey" to come

above




the




rest (to have higher scores)
Meaning i would wish the results to appear like:

ice hockey
ice hockey
ice hockey
winter Olympics: hockey, ice, medallists
ice hockey: British Sekonda Superleague Play-Off Championship:

finals



ice age
National Hockey League
Cracking the Ice Age
ground-ice

My overriden similarity class contains just this method:
public float coord(int overlap, int maxOverlap) {

return 1.0f;

}





I feel it is the weight factor which is producing indesirable

results.




Any




help in this regard would be highly appreciated.

Regards,
Niraj

----- Original Message -----
From: "Brisbart Franck" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, June 04, 2004 8:46 PM
Subject: Re: score and frequency








Hi,

Be careful to set the default similarity
'Similarity.setDefault(similarity)' before creating your search

instance




(IndexSearcher).
If you change the default similarity after, you'll still use

the

old



one.




You'd better use the 'searcher.setSimilarity' method on your

searcher.




Franck


Phil brunet wrote:






Hi to all.

Maybe the term frequency is not the only parameter you need to

override




to "customize" the score attributed by Lucene.

Maybe you should consider the normalisation factor, the idf

and

the



coord factor ?

Philippe







From: "Niraj Alok" <[EMAIL PROTECTED]>
Reply-To: "Lucene Users List"

<[EMAIL PROTECTED]>

To: "Lucene Users List" <[EMAIL PROTECTED]>
Subject: Re: score and frequency
Date: Fri, 4 Jun 2004 15:13:32 +0530

Hi Erik,

Thanks for the suggestion.

I tried this:
public class RelevanceSimilarity extends DefaultSimilarity

{

public float tf(float freq) {

System.out.println("discounting frequency");

return (float)1;

}

}



and in my query class, I used :

Similarity.setDefault(similarity);

Hits hits = is.search(query);

for(i = 0; i < hits.length(); i ++)

result = result + hits.score(i);



However, this is still not giving me the expected result. Do

I

need




to




do






something else?


Regards, Niraj

----- Original Message -----
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, June 04, 2004 1:55 PM
Subject: Re: score and frequency








On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote:






Hi,

I am having some problems with the score of lucene.
I am trying to get the results displayed according to

hits.score



and






it is giving the results correctly.
However I do not want the frequency factor to be used for

the

computation of the score.

Is it possible to get the score which does not have the

frequency




factor in it ?

Have a look at the javadocs for Similarity.

DefaultSimilarity

is



used






unless otherwise specified. You could subclass that and

override



this:






public float tf(float freq) {
return (float)Math.sqrt(freq);
}

and return 1.0.  This might give you the effect you want.

Erik



------------------------------------------------------------------

-

-


-



To unsubscribe, e-mail:

[EMAIL PROTECTED]




For additional commands, e-mail:

[EMAIL PROTECTED]




-------------------------------------------------------------------

-

-


To unsubscribe, e-mail:

[EMAIL PROTECTED]



For additional commands, e-mail:

[EMAIL PROTECTED]



_________________________________________________________________

Bloquez les fen�tres pop-up, c'est gratuit !

http://toolbar.msn.fr


--------------------------------------------------------------------

-

To unsubscribe, e-mail:

[EMAIL PROTECTED]


For additional commands, e-mail:

[EMAIL PROTECTED]




--
Franck Brisbart
R&D
http://www.kelkoo.com



--------------------------------------------------------------------

-

To unsubscribe, e-mail:

[EMAIL PROTECTED]


For additional commands, e-mail:

[EMAIL PROTECTED]





--------------------------------------------------------------------

-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]


--
Franck Brisbart
R&D
http://www.kelkoo.com



--------------------------------------------------------------------

-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]



--
Franck Brisbart
R&D
http://www.kelkoo.com



---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:

[EMAIL PROTECTED]




---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



--
Franck Brisbart
R&D
http://www.kelkoo.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



--
Franck Brisbart
R&D
http://www.kelkoo.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]




-- Franck Brisbart R&D http://www.kelkoo.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





--
Franck Brisbart
R&D
http://www.kelkoo.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to