Re: Is Fair Similarity working with lucene 2.2 ?

Srikant Jakilinki Tue, 22 Jan 2008 11:04:45 -0800

OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits.Maybe there is a loss of precision somewhere. Or when scores are equal,Lucene is doing something unintended/overlooked and thus putting shorterdocuments higher as the experiment is a special case where the TF of aqueried term (for both suites, the TF of x = 10%) is equal which is veryrarely. Or maybe the IDF factor is kicking in in some strange wayalthough it shouldnt. There are a number of varied reasons, but for thenaked eye, there isnt much.

However, that said, length normalization is not a science but an art andthe simple scheme we have here in the FairSimilarity will not performalways as expected in real world scenarios. Maybe I am missing somethingor have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topicbecause it is an issue of computing relevance which is the core of search.


Cheers,
Srikant

Fabrice Robini wrote:

Oooops sorry, bad cut/paste...

Here is the right one :-)

    public void testFairSimilarity() throws CorruptIndexException,
IOException, ParseException
    {
        Directory theDirectory = new RAMDirectory();
        Analyzer theAnalyzer = new StandardAnalyzer();

IndexWriter theIndexWriter = new IndexWriter(theDirectory,

theAnalyzer);
        theIndexWriter.setSimilarity(new FairSimilarity());

Document doc1 = new Document();

        Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
        Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
Field.Store.NO, Field.Index.TOKENIZED);
        doc1.add(name1);

doc1.add(content1);theIndexWriter.addDocument(doc1);Document doc2 = new Document();

        Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
Field.Index.UN_TOKENIZED);
        Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
        doc2.add(name2);

doc2.add(content2);theIndexWriter.addDocument(doc2);theIndexWriter.close();Searcher searcher = new IndexSearcher(theDirectory);

        searcher.setSimilarity(new FairSimilarity());

        QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

        Hits hits = searcher.search(queryParser.parse("x"));

        assertEquals(2, hits.length());
        assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
        assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
    }



Srikant Jakilinki-3 wrote:

Well, I cant seem to even get past the assertions of this code.
The first assertion is failing in that I get 0 hits. I am usingSimpleAnalyzer since I do not have a FrenchAnalyzer.
Any thoughts?
Srikant


----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Is Fair Similarity working with lucene 2.2 ?

Reply via email to