OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits. Maybe there is a loss of precision somewhere. Or when scores are equal, Lucene is doing something unintended/overlooked and thus putting shorter documents higher as the experiment is a special case where the TF of a queried term (for both suites, the TF of x = 10%) is equal which is very rarely. Or maybe the IDF factor is kicking in in some strange way although it shouldnt. There are a number of varied reasons, but for the naked eye, there isnt much.

However, that said, length normalization is not a science but an art and the simple scheme we have here in the FairSimilarity will not perform always as expected in real world scenarios. Maybe I am missing something or have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topic because it is an issue of computing relevance which is the core of search.

Cheers,
Srikant

Fabrice Robini wrote:
Oooops sorry, bad cut/paste...

Here is the right one :-)

    public void testFairSimilarity() throws CorruptIndexException,
IOException, ParseException
    {
        Directory theDirectory = new RAMDirectory();
        Analyzer theAnalyzer = new StandardAnalyzer();
IndexWriter theIndexWriter = new IndexWriter(theDirectory,
theAnalyzer);
        theIndexWriter.setSimilarity(new FairSimilarity());
Document doc1 = new Document();
        Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
        Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
Field.Store.NO, Field.Index.TOKENIZED);
        doc1.add(name1);
doc1.add(content1); theIndexWriter.addDocument(doc1); Document doc2 = new Document();
        Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
        Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
        doc2.add(name2);
doc2.add(content2); theIndexWriter.addDocument(doc2); theIndexWriter.close(); Searcher searcher = new IndexSearcher(theDirectory);
        searcher.setSimilarity(new FairSimilarity());

        QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

        Hits hits = searcher.search(queryParser.parse("x"));

        assertEquals(2, hits.length());
        assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
        assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
    }


Srikant Jakilinki-3 wrote:
Well, I cant seem to even get past the assertions of this code.

The first assertion is failing in that I get 0 hits. I am using SimpleAnalyzer since I do not have a FrenchAnalyzer.

Any thoughts?
Srikant


----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to