Hi Srikant, I really thank you for your reply, it's very interesting. I have to say I am confused with that now... I do not know what I can to for passing this Unit test...
I agree with you, it may be an issue of computing relevance. Fabrice Srikant Jakilinki-3 wrote: > > OK, got it to work. Thanks. > > By a quick scoring comparision, I got the same scores for both hits. > Maybe there is a loss of precision somewhere. Or when scores are equal, > Lucene is doing something unintended/overlooked and thus putting shorter > documents higher as the experiment is a special case where the TF of a > queried term (for both suites, the TF of x = 10%) is equal which is very > rarely. Or maybe the IDF factor is kicking in in some strange way > although it shouldnt. There are a number of varied reasons, but for the > naked eye, there isnt much. > > However, that said, length normalization is not a science but an art and > the simple scheme we have here in the FairSimilarity will not perform > always as expected in real world scenarios. Maybe I am missing something > or have forgot my basics but that is not to say your observation is > trivial. > > Rather, the contrary. Hope there will be more activity on this topic > because it is an issue of computing relevance which is the core of search. > > Cheers, > Srikant > > Fabrice Robini wrote: >> Oooops sorry, bad cut/paste... >> >> Here is the right one :-) >> >> public void testFairSimilarity() throws CorruptIndexException, >> IOException, ParseException >> { >> Directory theDirectory = new RAMDirectory(); >> Analyzer theAnalyzer = new StandardAnalyzer(); >> >> IndexWriter theIndexWriter = new IndexWriter(theDirectory, >> theAnalyzer); >> theIndexWriter.setSimilarity(new FairSimilarity()); >> >> Document doc1 = new Document(); >> Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES, >> Field.Index.UN_TOKENIZED); >> Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10", >> Field.Store.NO, Field.Index.TOKENIZED); >> doc1.add(name1); >> doc1.add(content1); >> theIndexWriter.addDocument(doc1); >> >> Document doc2 = new Document(); >> Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES, >> Field.Index.UN_TOKENIZED); >> Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 >> 13 >> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED); >> doc2.add(name2); >> doc2.add(content2); >> theIndexWriter.addDocument(doc2); >> >> theIndexWriter.close(); >> >> Searcher searcher = new IndexSearcher(theDirectory); >> searcher.setSimilarity(new FairSimilarity()); >> >> QueryParser queryParser = new QueryParser("CONTENT", >> theAnalyzer); >> >> Hits hits = searcher.search(queryParser.parse("x")); >> >> assertEquals(2, hits.length()); >> assertEquals("BIG_SUITE", hits.doc(0).get("NAME")); >> assertEquals("SHORT_SUITE", hits.doc(1).get("NAME")); >> } >> >> >> >> >> Srikant Jakilinki-3 wrote: >> >>> Well, I cant seem to even get past the assertions of this code. >>> >>> The first assertion is failing in that I get 0 hits. I am using >>> SimpleAnalyzer since I do not have a FrenchAnalyzer. >>> >>> Any thoughts? >>> Srikant >>> > > ---------------------------------------------------------------------- > Free pop3 email with a spam filter. > http://www.bluebottle.com/tag/5 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Is-Fair-Similarity-working-with-lucene-2.2---tp15001250p15026214.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]