Lucene has basic scoring algorithm based on tf, tdf
and field boost value.

And Nutch adopts page rank concept by using its'
unique link analysis via DistributedAnalysisTool
class.

Actually I don't think most people run this. I believe it starts to have performance issues when your page counts get large, which is one of the reasons for the mapred work being done by Doug/Mike in a branch.

Typically the extent of "link analysis" is the number of inbound links to a page, which is always being calculated whenever the WebDB is updated following a crawl.

But when I take a look at "score in detail" of Nutch's
search result, I didn't see a factor called "link
analysis" or something else like that.

Where can I see this factor or it is already combined
into the score value we saw in the score detail page.

See my previous post on how inbound link counts are used to boost a Lucene document (web page).

-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

Reply via email to