The whole problem I have to face is the following:

I have a web service which searches a corpus of documents and returns
a list of documents which match the query

le list is not ordered (I do not know the details of the search
angine, I only have its result for a query)

then I have this list of documents, which represents a subset of the corpus

I have to rank the documents of the list, using your scoring algorithm

now: I do not know if I have to import all the documents in a sort of
Index and apply Lucene's ranking algorithm (if there is one), or take
each document and compute the score of the document vs the query, and
then sort the list based on the scores

currently I am following the second approach, thus I need to compute
the score of each document

I think the MemoryIndex is good for this, I am trying to compile the
example provided in the javadoc, but there is some package lacking...

Michele

On 11/2/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: > .. Btw, I do not have an index, I have 1 Document, and 1 Query.

: Lucene scoring - http://lucene.apache.org/java/docs/scoring.html - uses
: pre-computed statistics, location info, and the number of documents in the
: index (1 in your case). So some preparation is required before a
: (stand-alone) document can be scored against a query.

Doron's comments really just scratch the surface of a larger issue with
your question: Lucene is not an API for evaluating how similar a
"Document" is to a "Query", it's for finding Documents in a Corpus which
match a Query, and (optionally) using the "Score" to know which Documnts
match better then other docuemnts.

For most of the various types of Queries that exist in Lucene, the score
is very dependent on how common the Terms involved are in the Corpus as a
whole -- if your Corpus consists of only 1 Document, then your scores are
going to be relatively meaningless.

Perhaps what you are interested in is more of an substring matching count?
or an Edit Distance type calculation? ... can you give us a concrete
example of what type of "score" you are looking for and what you mean when
you say "Query" ?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Michele Amoretti, Ph.D.
Distributed Systems Group
Dipartimento di Ingegneria dell'Informazione
Università degli Studi di Parma
http://www.ce.unipr.it/people/amoretti

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to