1) For Lucene's scoring, there's this:
http://lucene.apache.org/java/2_4_0/scoring.html#Scoring
And Lucene in action also describes the scoring formula.
2) It's up to you to build a Lucene document from your content, so you
decide which parts of your content (body, link, meta) become which
fields in Lucene. At that point Lucene's scoring formula kicks in.
Mike
Marco Palumbo - In4Tech wrote:
Good morning,
some days ago I sent the following e-mail, but I had no feed-back on
it. Could you please tell us if there is someone able to cooperate
with us on this project?
Thank you in advance,
Marco Palumbo
dott. Marco Palumbo
Chief Financial Officer
In4Tech s.r.l.
c.so Canalgrande, n. 88
41100 Modena - Italy
tel.: 0039 059 230651
fax : 0039 059 244672
www.in4tech.net
From: Marco Palumbo - In4Tech
Sent: giovedì 13 novembre 2008 16.03
To: java-user@lucene.apache.org
Subject: Lucene
Good morning,
our company works in the field of industrial biotechnologies. We
were interested in having a software capable to classify web-sites
(and so organizations) working in our field. So, one of our IT
consultants organized a system based on Heritrix (http://crawler.archive.org/
) and Lucene.
As you know, Lucene calculates some scores of frequency. We would
like to know/obtain:
1) the formula used by Lucene to calculate the scores;
2) for each page, the basic information used by Lucene to calculate
the scores (atomic data: term's frequency in meta, link, body;
dimension of the page; ...).
How can you help us to have this kind of information?
Thanks.
Marco Palumbo
dott. Marco Palumbo
Chief Financial Officer
In4Tech s.r.l.
c.so Canalgrande, n. 88
41100 Modena - Italy
tel.: 0039 059 230651
fax : 0039 059 244672
www.in4tech.net
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]