Hi Srikant,
I really thank you for your reply, it's very interesting.
I have to say I am confused with that now...
I do not know what I can to for passing this Unit test...
I agree with you, it may be an issue of computing relevance.
Fabrice
Srikant Jakilinki-3 wrote:
>
> OK, got it to work. Thanks.
>
> By a quick scoring comparision, I got the same scores for both hits.
> Maybe there is a loss of precision somewhere. Or when scores are equal,
> Lucene is doing something unintended/overlooked and thus putting shorter
> documents higher as the experiment is a special case where the TF of a
> queried term (for both suites, the TF of x = 10%) is equal which is very
> rarely. Or maybe the IDF factor is kicking in in some strange way
> although it shouldnt. There are a number of varied reasons, but for the
> naked eye, there isnt much.
>
> However, that said, length normalization is not a science but an art and
> the simple scheme we have here in the FairSimilarity will not perform
> always as expected in real world scenarios. Maybe I am missing something
> or have forgot my basics but that is not to say your observation is
> trivial.
>
> Rather, the contrary. Hope there will be more activity on this topic
> because it is an issue of computing relevance which is the core of search.
>
> Cheers,
> Srikant
>
> Fabrice Robini wrote:
>> Oooops sorry, bad cut/paste...
>>
>> Here is the right one :-)
>>
>> public void testFairSimilarity() throws CorruptIndexException,
>> IOException, ParseException
>> {
>> Directory theDirectory = new RAMDirectory();
>> Analyzer theAnalyzer = new StandardAnalyzer();
>>
>> IndexWriter theIndexWriter = new IndexWriter(theDirectory,
>> theAnalyzer);
>> theIndexWriter.setSimilarity(new FairSimilarity());
>>
>> Document doc1 = new Document();
>> Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>> Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
>> Field.Store.NO, Field.Index.TOKENIZED);
>> doc1.add(name1);
>> doc1.add(content1);
>> theIndexWriter.addDocument(doc1);
>>
>> Document doc2 = new Document();
>> Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
>> Field.Index.UN_TOKENIZED);
>> Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12
>> 13
>> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>> doc2.add(name2);
>> doc2.add(content2);
>> theIndexWriter.addDocument(doc2);
>>
>> theIndexWriter.close();
>>
>> Searcher searcher = new IndexSearcher(theDirectory);
>> searcher.setSimilarity(new FairSimilarity());
>>
>> QueryParser queryParser = new QueryParser("CONTENT",
>> theAnalyzer);
>>
>> Hits hits = searcher.search(queryParser.parse("x"));
>>
>> assertEquals(2, hits.length());
>> assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>> assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>> }
>>
>>
>>
>>
>> Srikant Jakilinki-3 wrote:
>>
>>> Well, I cant seem to even get past the assertions of this code.
>>>
>>> The first assertion is failing in that I get 0 hits. I am using
>>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>>
>>> Any thoughts?
>>> Srikant
>>>
>
> ----------------------------------------------------------------------
> Free pop3 email with a spam filter.
> http://www.bluebottle.com/tag/5
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15026214.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]