Boost for the page maybe calculated in few different ways (and in few
different places in nutch):
1) PageRank based score
- calculated by "nutch analyze" command based on WebDB
- during fetchlist generation scores from WebDB are stored in segment
- indexing phase uses score to set the boost for a page
2) based on number of incoming links
- during fetchlist generation inlinks are stored in segment
- during indexing number of inlinks is read from segment and used in
boost calculation
There is a separate command (updatesegs) to update score and inlink
information in existing segments.
Regards
Piotr
Jay Pound wrote:
also how does it keep track of incoming links globally on these pages, if
the weight is determined by # of incoming links then there would have to be
somewhere it keeps track so when you split your indexes it can still have an
accurate value for the distributed search?
-J
----- Original Message -----
From: "Jay Pound" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, August 11, 2005 4:49 PM
Subject: page ranking weights
at which step does nutch figure out the weight of each page, the updatedb
step? or the index step?
Thanks,
-Jay