That is already in the similarity formula, in tf term, documents that
have more occurrences of a given term receive a higher score.
Jamal H Tandina wrote:
<<<<
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more exactly the
method:
public float lengthNorm(String fieldName, int numTerms) {
return (float)(1.0 / Math.sqrt(numTerms));
}
to something like this
public float lengthNorm(String fieldName, int numTerms) {
return (float)(Math.sqrt(numTerms));
}
I want to give priority to documents that have the word we are searching more
frequent !
Thank you
Jamal H Tandina <[EMAIL PROTECTED]> a écrit : Thank you for your reply
How can i change the defaultSimilarity in the indexing and the searching, do
you have an example or an url how to set the Similarity ?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
Thanks again
Ion Badita a écrit : Try too look at Similarity, there you will find thinks about the
scoring. Your query is more "similar" with the shorter document.
If you have 2 documents with a field body; first with words "red flower"
and the second with just one word "flower", and search for the word
"flower", the second document will score high because is very similar
with the query.
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more exactly the
method:
public float lengthNorm(String fieldName, int numTerms) {
return (float)(1.0 / Math.sqrt(numTerms));
}
to something like this
public float lengthNorm(String fieldName, int numTerms) {
return (float)(Math.sqrt(numTerms));
}
Reindex your documents with the Similarity modified and try to search
again. The IndexWriter has a method to set the similarity used for indexing.
I hope this will help you...
Ion
Jamal jamalator wrote:
Hi
I have indexed this html document
=============z1========================
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
=============z2=========================
zo zo zo zo zo zo zo zo zo zo zo zo
zo zo zo zo zo zo zo zo zo zo zo zo
=============z3==========================
zo zo zo zo zo zo zo zo zo zo zo zo
=========================================
with this code
Field contentK1 = new
Field("htmlcontent",httpd.getContentKeywords(),Field.Store.NO,Field.Index.TOKENIZED
);
contentK1.setBoost(1/10f); //10%
doc.add(contentK1);
and when a search "zo" with luke i have (whitespaceanalyser):
(score , id )
(0,0957,z2 )
(0,0947,z3 )
(0,0938,z1)
NORMALY the resut expected have to be z1 z2 z3
Some One have an idea ??
Thank you all
---------------------------------
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------------
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------------
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]