RE: Scoring Technique based on Relevance Feeback & other Parameters

sachin Thu, 31 Aug 2006 02:22:07 -0700

Hello,
Very small and sweet Question?

Does Apache allow me to change the Final classes which are distributed by
Apache for Scorers?  Or can I copy and paste some of the Lucene code into my
commercial application within my organization?


TermScorer, BooleanScorer are final classes. But all other scorers are
non-final classes.

Because my interest lies with changes in scoring strategy which is based on
Relevance Feedback? 

One observation :
Lucene is designed with inflexible scoring mechanism based on TF-IDF.
It would be really nice if much simpler scoring mechanisms should have given
chance for implementation

Query object should have construct "ScoringStrategy" object which will pass
to the scorer.

ScoringStrategy may look like this..

Interface ScoringStragey
{
// Where float is the score... 
// List of objects will be passed to the strategy which will calculate the
// scoreof 
        float score(vector[] objects) ;          
}

There may be other implementations possible for flexible scoring ?





-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 1:14 AM
To: java-user@lucene.apache.org
Subject: Re: Scoring Technique based on Relevance Feeback & other Parameters


: package. By implementing new type of  tuple (Query,Weight,Scorer) I can
: easily implement new Scoring technique. Unfortunatly Lucene index shows
that
: it stores only TF / Position vectors for each term within document.

:         I am interested in investigating new scoring technique where I
will
: use some other parameters relating to the Term to rank the documents. For
an
: example web page ranking is assisted by parameters like number of links
: towards webpage and number of link from web - page.  It indicates that we
: need to store relatively more information about terms within the index.
But
: HoW ? . I need to investigate

there is a distinction between storing more information about a term and
storing additional information about a document.

the flexible payload type approaches that have been discussed should make
info about a term easy (ie: the term is "wind", it's type is "noun", it's
usage in the sentence is as a "subject", it's importance is "88.3") but
you can already store additional information about documents (like the
total popularity of a document) in Lucene -- either by using the document
boost (if you always want it to be part of the score calculations) or as a
seperate field which you can factor into the score calculations using
something like FunctionQuery...

http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/pa
ckage-summary.html

...i use this all the time to make "recent" docs score better, or "more
popular docs" score better.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Scoring Technique based on Relevance Feeback & other Parameters

Reply via email to